Re: MLlib: issue with increasing maximum depth of the decision tree

2014-08-21 Thread SURAJ SHETH
Hi Sameer,
http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Decision-Tree-not-getting-built-for-5-or-more-levels-maxDepth-5-and-the-one-built-for-3-levelsy-td7401.html


Thanks and Regards,
Suraj Sheth


On Thu, Aug 21, 2014 at 10:52 PM, Sameer Tilak  wrote:

>  Resending this:
>
>
> Hi All,
>
> My dataset is fairly small -- a CSV file with around half million rows and
> 600 features.  Everything works when I set maximum depth of the decision
> tree to 5 or 6. However, I get this error for larger values of that
> parameter -- For example when I set it to 10. Have others encountered a
> similar issue?
>
>
>
> 14/08/20 10:27:26 INFO TaskSetManager: Serialized task 5.0:390 as 400933
> bytes in 1 ms
>
> 14/08/20 10:27:26 WARN TaskSetManager: Lost TID 1194 (task 5.0:399)
>
> 14/08/20 10:27:26 WARN TaskSetManager: Loss was due to
> java.lang.ArrayIndexOutOfBoundsException
>
> java.lang.ArrayIndexOutOfBoundsException: 178
>
> at org.apache.spark.mllib.linalg.DenseVector.apply(Vectors.scala:163)
>
> at
> org.apache.spark.mllib.tree.DecisionTree$.findBin$1(DecisionTree.scala:444)
>
> at
> org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findBinsForLevel$1(DecisionTree.scala:529)
>
> at
> org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653)
>
> at
> org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653)
>
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
>
> at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
>
> at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
>
> at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)
>
> at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)
>
> at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116)
>
> at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116)
>
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:744)
>


MLlib: issue with increasing maximum depth of the decision tree

2014-08-21 Thread Sameer Tilak











Resending this:
Hi All,
My dataset is fairly small -- a CSV file with around half million rows and 600 
features.  Everything works when I set maximum depth of the decision tree to 5 
or 6. However, I get this error for larger values of that parameter -- For 
example when I set it to 10. Have others encountered a similar issue? 




14/08/20 10:27:26 INFO TaskSetManager: Serialized task 5.0:390 as 400933 bytes 
in 1 ms
14/08/20 10:27:26 WARN TaskSetManager: Lost TID 1194 (task 5.0:399)
14/08/20 10:27:26 WARN TaskSetManager: Loss was due to 
java.lang.ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException: 178
at org.apache.spark.mllib.linalg.DenseVector.apply(Vectors.scala:163)
at 
org.apache.spark.mllib.tree.DecisionTree$.findBin$1(DecisionTree.scala:444)
at 
org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findBinsForLevel$1(DecisionTree.scala:529)
at 
org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653)
at 
org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)
at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)
at 
org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116)
at 
org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
  

MLlib: issue with increasing maximum depth of the decision tree

2014-08-20 Thread Sameer Tilak
Hi All,My dataset is fairly small -- a CSV file with around half million rows 
and 600 features.  Everything works when I set maximum depth of the decision 
tree to 5 or 6. However, I get this error for larger values of that parameter 
-- For example when I set it to 10. Have others encountered a similar issue? 

14/08/20 10:27:26 INFO TaskSetManager: Serialized task 5.0:390 as 400933 bytes 
in 1 ms14/08/20 10:27:26 WARN TaskSetManager: Lost TID 1194 (task 
5.0:399)14/08/20 10:27:26 WARN TaskSetManager: Loss was due to 
java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoundsException:
 178   at org.apache.spark.mllib.linalg.DenseVector.apply(Vectors.scala:163)   
at org.apache.spark.mllib.tree.DecisionTree$.findBin$1(DecisionTree.scala:444)  
at 
org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findBinsForLevel$1(DecisionTree.scala:529)
   at 
org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653)
at 
org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)  at 
scala.collection.Iterator$class.foreach(Iterator.scala:727)  at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)   at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)   at 
scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)  at 
scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)  at 
scala.collection.AbstractIterator.aggregate(Iterator.scala:1157) at 
org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)at 
org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838)at 
org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116) at 
org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116) at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)  at 
org.apache.spark.scheduler.Task.run(Task.scala:51)   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
 at java.lang.Thread.run(Thread.java:744)