[ https://issues.apache.org/jira/browse/SPARK-17801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770463#comment-15770463 ]
Ilya Matiach commented on SPARK-17801: -------------------------------------- Taking a look into the error > [ML]Random Forest Regression fails for large input > -------------------------------------------------- > > Key: SPARK-17801 > URL: https://issues.apache.org/jira/browse/SPARK-17801 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 1.6.1 > Environment: Ubuntu 14.04 > Reporter: samkit > Priority: Minor > > Random Forest Regression > Data:https://www.kaggle.com/c/grupo-bimbo-inventory-demand/download/train.csv.zip > Parameters: > NumTrees:500 Maximum Bins:7477383 MaxDepth:27 > MinInstancesPerNode:8648 SamplingRate:1.0 > Java Options: > "-Xms16384M" "-Xmx16384M" "-Dspark.locality.wait=0s" > "-Dspark.driver.extraJavaOptions=-Xss10240k -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+UseConcMarkSweepGC > -XX:+UseParNewGC -XX:ParallelGCThreads=2 -XX:-UseAdaptiveSizePolicy > -XX:ConcGCThreads=2 -XX:-UseGCOverheadLimit > -XX:CMSInitiatingOccupancyFraction=75 -XX:NewSize=8g -XX:MaxNewSize=8g > -XX:SurvivorRatio=3 -DnumPartitions=36" "-Dspark.submit.deployMode=cluster" > "-Dspark.speculation=true" " "-Dspark.speculation.multiplier=2" > "-Dspark.driver.memory=16g" "-Dspark.speculation.interval=300ms" > "-Dspark.speculation.quantile=0.5" "-Dspark.akka.frameSize=768" > "-Dspark.driver.supervise=false" "-Dspark.executor.cores=6" > "-Dspark.executor.extraJavaOptions=-Xss10240k -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution > -XX:-UseAdaptiveSizePolicy -XX:+UseParallelGC -XX:+UseParallelOldGC > -XX:ParallelGCThreads=6 -XX:NewSize=22g -XX:MaxNewSize=22g > -XX:SurvivorRatio=2 -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDateStamps" > "-Dspark.rpc.askTimeout=10" "-Dspark.executor.memory=40g" > "-Dspark.driver.maxResultSize=3g" "-Xss10240k" "-XX:+PrintGCDetails" > "-XX:+PrintGCTimeStamps" "-XX:+PrintTenuringDistribution" > "-XX:+UseConcMarkSweepGC" "-XX:+UseParNewGC" "-XX:ParallelGCThreads=2" > "-XX:-UseAdaptiveSizePolicy" "-XX:ConcGCThreads=2" "-XX:-UseGCOverheadLimit" > "-XX:CMSInitiatingOccupancyFraction=75" "-XX:NewSize=8g" "-XX:MaxNewSize=8g" > "-XX:SurvivorRatio=3" "-DnumPartitions=36" > Partial Driver StackTrace: > org.apache.spark.rdd.PairRDDFunctions.collectAsMap(PairRDDFunctions.scala:740) > > org.apache.spark.ml.tree.impl.RandomForest$.findBestSplits(RandomForest.scala:525) > org.apache.spark.ml.tree.impl.RandomForest$.run(RandomForest.scala:160) > > org.apache.spark.ml.regression.CustomRandomForestRegressor.train(CustomRandomForestRegressor.scala:209) > > org.apache.spark.ml.regression.CustomRandomForestRegressor.train(CustomRandomForestRegressor.scala:197) > org.apache.spark.ml.Predictor.fit(Predictor.scala:90) > org.apache.spark.ml.Predictor.fit(Predictor.scala:71) > org.apache.spark.ml.Estimator.fit(Estimator.scala:59) > org.apache.spark.ml.Estimator$$anonfun$fit$1.apply(Estimator.scala:78) > org.apache.spark.ml.Estimator$$anonfun$fit$1.apply(Estimator.scala:78) > For complete Executor and Driver ErrorLog > https://gist.github.com/anonymous/603ac7f8f17e43c51ba93b2934cd4cb6 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org