Re: MLlib: issue with increasing maximum depth of the decision tree
Hi Sameer, http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Decision-Tree-not-getting-built-for-5-or-more-levels-maxDepth-5-and-the-one-built-for-3-levelsy-td7401.html Thanks and Regards, Suraj Sheth On Thu, Aug 21, 2014 at 10:52 PM, Sameer Tilak wrote: > Resending this: > > > Hi All, > > My dataset is fairly small -- a CSV file with around half million rows and > 600 features. Everything works when I set maximum depth of the decision > tree to 5 or 6. However, I get this error for larger values of that > parameter -- For example when I set it to 10. Have others encountered a > similar issue? > > > > 14/08/20 10:27:26 INFO TaskSetManager: Serialized task 5.0:390 as 400933 > bytes in 1 ms > > 14/08/20 10:27:26 WARN TaskSetManager: Lost TID 1194 (task 5.0:399) > > 14/08/20 10:27:26 WARN TaskSetManager: Loss was due to > java.lang.ArrayIndexOutOfBoundsException > > java.lang.ArrayIndexOutOfBoundsException: 178 > > at org.apache.spark.mllib.linalg.DenseVector.apply(Vectors.scala:163) > > at > org.apache.spark.mllib.tree.DecisionTree$.findBin$1(DecisionTree.scala:444) > > at > org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findBinsForLevel$1(DecisionTree.scala:529) > > at > org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653) > > at > org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > > at > scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144) > > at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157) > > at > scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201) > > at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157) > > at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838) > > at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838) > > at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116) > > at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > > at org.apache.spark.scheduler.Task.run(Task.scala:51) > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) >
MLlib: issue with increasing maximum depth of the decision tree
Resending this: Hi All, My dataset is fairly small -- a CSV file with around half million rows and 600 features. Everything works when I set maximum depth of the decision tree to 5 or 6. However, I get this error for larger values of that parameter -- For example when I set it to 10. Have others encountered a similar issue? 14/08/20 10:27:26 INFO TaskSetManager: Serialized task 5.0:390 as 400933 bytes in 1 ms 14/08/20 10:27:26 WARN TaskSetManager: Lost TID 1194 (task 5.0:399) 14/08/20 10:27:26 WARN TaskSetManager: Loss was due to java.lang.ArrayIndexOutOfBoundsException java.lang.ArrayIndexOutOfBoundsException: 178 at org.apache.spark.mllib.linalg.DenseVector.apply(Vectors.scala:163) at org.apache.spark.mllib.tree.DecisionTree$.findBin$1(DecisionTree.scala:444) at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findBinsForLevel$1(DecisionTree.scala:529) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157) at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201) at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838) at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838) at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116) at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)
MLlib: issue with increasing maximum depth of the decision tree
Hi All,My dataset is fairly small -- a CSV file with around half million rows and 600 features. Everything works when I set maximum depth of the decision tree to 5 or 6. However, I get this error for larger values of that parameter -- For example when I set it to 10. Have others encountered a similar issue? 14/08/20 10:27:26 INFO TaskSetManager: Serialized task 5.0:390 as 400933 bytes in 1 ms14/08/20 10:27:26 WARN TaskSetManager: Lost TID 1194 (task 5.0:399)14/08/20 10:27:26 WARN TaskSetManager: Loss was due to java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoundsException: 178 at org.apache.spark.mllib.linalg.DenseVector.apply(Vectors.scala:163) at org.apache.spark.mllib.tree.DecisionTree$.findBin$1(DecisionTree.scala:444) at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findBinsForLevel$1(DecisionTree.scala:529) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157) at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201) at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116) at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)