[ https://issues.apache.org/jira/browse/SPARK-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng resolved SPARK-5119. ---------------------------------- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 3975 [https://github.com/apache/spark/pull/3975] > java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree > model > ------------------------------------------------------------------------------- > > Key: SPARK-5119 > URL: https://issues.apache.org/jira/browse/SPARK-5119 > Project: Spark > Issue Type: Bug > Components: ML, MLlib > Affects Versions: 1.1.0, 1.2.0 > Environment: Linux ubuntu 14.04 > Reporter: Vivek Kulkarni > Fix For: 1.3.0 > > > First I tried to see if there was a bug raised before with similar trace. I > found https://www.mail-archive.com/user@spark.apache.org/msg13708.html but > the suggestion to upgarde to latest code bae ( I cloned from master branch) > does not fix this issue. > Issue: try to train a decision tree classifier on some data.After training > and when it begins colllect, it crashes: > 15/01/06 22:28:15 INFO BlockManagerMaster: Updated info of block rdd_52_1 > 15/01/06 22:28:15 ERROR Executor: Exception in task 1.0 in stage 31.0 (TID > 1895) > java.lang.ArrayIndexOutOfBoundsException: -1 > at > org.apache.spark.mllib.tree.impurity.GiniAggregator.update(Gini.scala:93) > at > org.apache.spark.mllib.tree.impl.DTStatsAggregator.update(DTStatsAggregator.scala:100) > at > org.apache.spark.mllib.tree.DecisionTree$.orderedBinSeqOp(DecisionTree.scala:419) > at > org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$nodeBinSeqOp$1(DecisionTree.scala:511) > at > org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:536 > ) > at > org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:533 > ) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) > at > org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:533) > at > org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628) > at > org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > Minimal code: > data = MLUtils.loadLibSVMFile(sc, > '/scratch1/vivek/datasets/private/a1a').cache() > model = DecisionTree.trainClassifier(data, numClasses=2, > categoricalFeaturesInfo={}, maxDepth=5, maxBins=100) > Just download the data from: > http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a1a -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org