[ 
https://issues.apache.org/jira/browse/SPARK-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323280#comment-14323280
 ] 

Joseph K. Bradley commented on SPARK-5809:
------------------------------------------

I agree with [~srowen]'s assessment.  The problem should only occur if both (a) 
logging is at the debug level and (b) you have a very large number of features. 
 How many features do you have?

We could change the code to print one debug message for each feature (loop with 
logDebug for each feature inside of it).  If we do that, we should probably 
check the logging level to skip the loop when not printing debug-level messages.

> OutOfMemoryError in logDebug in RandomForest.scala
> --------------------------------------------------
>
>                 Key: SPARK-5809
>                 URL: https://issues.apache.org/jira/browse/SPARK-5809
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.2.0
>            Reporter: Devesh Parekh
>            Assignee: Joseph K. Bradley
>            Priority: Minor
>              Labels: easyfix
>
> When training a GBM on sparse vectors produced by HashingTF, I get the 
> following OutOfMemoryError, where RandomForest is building a debug string to 
> log.
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:3326)
>         at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
>         at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121
> )
>         at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
>         at java.lang.StringBuilder.append(StringBuilder.java:136)
>         at 
> scala.collection.mutable.StringBuilder.append(StringBuilder.scala:197)
>         at 
> scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:327
> )
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>         at 
> scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:320)
>         at 
> scala.collection.AbstractTraversable.addString(Traversable.scala:105)
>         at 
> scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:286)
>         at 
> scala.collection.AbstractTraversable.mkString(Traversable.scala:105)
>         at 
> scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:288)
>         at 
> scala.collection.AbstractTraversable.mkString(Traversable.scala:105)
>         at 
> org.apache.spark.mllib.tree.RandomForest$$anonfun$run$9.apply(RandomForest.scala:152)
>         at 
> org.apache.spark.mllib.tree.RandomForest$$anonfun$run$9.apply(RandomForest.scala:152)
>         at org.apache.spark.Logging$class.logDebug(Logging.scala:63)
>         at 
> org.apache.spark.mllib.tree.RandomForest.logDebug(RandomForest.scala:67)
>         at 
> org.apache.spark.mllib.tree.RandomForest.run(RandomForest.scala:150)
>         at org.apache.spark.mllib.tree.DecisionTree.run(DecisionTree.scala:64)
>         at 
> org.apache.spark.mllib.tree.GradientBoostedTrees$.org$apache$spark$mllib$tree$GradientBoostedTrees$$boost(GradientBoostedTrees.scala:150)
>         at 
> org.apache.spark.mllib.tree.GradientBoostedTrees.run(GradientBoostedTrees.scala:63)
>  
>         at 
> org.apache.spark.mllib.tree.GradientBoostedTrees$.train(GradientBoostedTrees.scala:96)
> A workaround until this is fixed is to modify log4j.properties in the conf 
> directory to filter out debug logs in RandomForest. For example:
> log4j.logger.org.apache.spark.mllib.tree.RandomForest=WARN



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to