Hi Yanbo,I think it was happening because some of the rows did not have all the columns. We are cleaning up the data and will let you know once we confirm this. Date: Thu, 14 Aug 2014 22:50:58 +0800 Subject: Re: java.lang.UnknownError: no bin was found for continuous variable. From: yanboha...@gmail.com To: ssti...@live.com
Can you supply the detail code and data you used.From the log, it looks like can not find the bin for specific feature.The bin for continuous feature is a unit that covers a specific range of the feature. 2014-08-14 7:43 GMT+08:00 Sameer Tilak <ssti...@live.com>: Hi All, I am using the decision tree algorithm and I get the following error. Any help would be great! java.lang.UnknownError: no bin was found for continuous variable. at org.apache.spark.mllib.tree.DecisionTree$.findBin$1(DecisionTree.scala:492) at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findBinsForLevel$1(DecisionTree.scala:529) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157) at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201) at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838) at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838) at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116) at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)14/08/13 16:36:06 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main] java.lang.UnknownError: no bin was found for continuous variable. at org.apache.spark.mllib.tree.DecisionTree$.findBin$1(DecisionTree.scala:492) at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findBinsForLevel$1(DecisionTree.scala:529) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$3.apply(DecisionTree.scala:653) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157) at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201) at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838) at org.apache.spark.rdd.RDD$$anonfun$21.apply(RDD.scala:838) at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116) at org.apache.spark.SparkContext$$anonfun$23.apply(SparkContext.scala:1116) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)