[jira] [Commented] (SPARK-5119) java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model
[ https://issues.apache.org/jira/browse/SPARK-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294046#comment-14294046 ] Nicolas Garneau commented on SPARK-5119: Hey guys, I am wondering what you think about letting the user control if its feature vectors are 0-based or 1-based. I used to have 0-based vectors for my datasets (worked a lot with scikit-learn) and I saw in the loadLibSVMFile function that you are converting any vectors to a 0-based... Thought it would be cool to add a optional parameters or something... Thanks guys, I'd be glad to give you some help :) java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model --- Key: SPARK-5119 URL: https://issues.apache.org/jira/browse/SPARK-5119 Project: Spark Issue Type: Bug Components: ML, MLlib Affects Versions: 1.1.0, 1.2.0 Environment: Linux ubuntu 14.04 Reporter: Vivek Kulkarni Assignee: Kai Sasaki Fix For: 1.3.0 First I tried to see if there was a bug raised before with similar trace. I found https://www.mail-archive.com/user@spark.apache.org/msg13708.html but the suggestion to upgarde to latest code bae ( I cloned from master branch) does not fix this issue. Issue: try to train a decision tree classifier on some data.After training and when it begins colllect, it crashes: 15/01/06 22:28:15 INFO BlockManagerMaster: Updated info of block rdd_52_1 15/01/06 22:28:15 ERROR Executor: Exception in task 1.0 in stage 31.0 (TID 1895) java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.spark.mllib.tree.impurity.GiniAggregator.update(Gini.scala:93) at org.apache.spark.mllib.tree.impl.DTStatsAggregator.update(DTStatsAggregator.scala:100) at org.apache.spark.mllib.tree.DecisionTree$.orderedBinSeqOp(DecisionTree.scala:419) at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$nodeBinSeqOp$1(DecisionTree.scala:511) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:536 ) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:533 ) at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:533) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628) at scala.collection.Iterator$class.foreach(Iterator.scala:727) Minimal code: data = MLUtils.loadLibSVMFile(sc, '/scratch1/vivek/datasets/private/a1a').cache() model = DecisionTree.trainClassifier(data, numClasses=2, categoricalFeaturesInfo={}, maxDepth=5, maxBins=100) Just download the data from: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a1a -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5119) java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model
[ https://issues.apache.org/jira/browse/SPARK-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270825#comment-14270825 ] Sean Owen commented on SPARK-5119: -- Yes, the input must contain categories that are positive integers. I think this is a reasonable restriction. Although MLUtils.loadLibSVMFile will convert the 1-based feature numbers to 0-based, it leaves the -1 / 1 target untouched. Simply map your -1 labels to 0. java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model --- Key: SPARK-5119 URL: https://issues.apache.org/jira/browse/SPARK-5119 Project: Spark Issue Type: Bug Components: ML, MLlib Affects Versions: 1.1.0, 1.2.0 Environment: Linux ubuntu 14.04 Reporter: Vivek Kulkarni First I tried to see if there was a bug raised before with similar trace. I found https://www.mail-archive.com/user@spark.apache.org/msg13708.html but the suggestion to upgarde to latest code bae ( I cloned from master branch) does not fix this issue. Issue: try to train a decision tree classifier on some data.After training and when it begins colllect, it crashes: 15/01/06 22:28:15 INFO BlockManagerMaster: Updated info of block rdd_52_1 15/01/06 22:28:15 ERROR Executor: Exception in task 1.0 in stage 31.0 (TID 1895) java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.spark.mllib.tree.impurity.GiniAggregator.update(Gini.scala:93) at org.apache.spark.mllib.tree.impl.DTStatsAggregator.update(DTStatsAggregator.scala:100) at org.apache.spark.mllib.tree.DecisionTree$.orderedBinSeqOp(DecisionTree.scala:419) at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$nodeBinSeqOp$1(DecisionTree.scala:511) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:536 ) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:533 ) at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:533) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628) at scala.collection.Iterator$class.foreach(Iterator.scala:727) Minimal code: data = MLUtils.loadLibSVMFile(sc, '/scratch1/vivek/datasets/private/a1a').cache() model = DecisionTree.trainClassifier(data, numClasses=2, categoricalFeaturesInfo={}, maxDepth=5, maxBins=100) Just download the data from: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a1a -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5119) java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model
[ https://issues.apache.org/jira/browse/SPARK-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270959#comment-14270959 ] Apache Spark commented on SPARK-5119: - User 'Lewuathe' has created a pull request for this issue: https://github.com/apache/spark/pull/3975 java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model --- Key: SPARK-5119 URL: https://issues.apache.org/jira/browse/SPARK-5119 Project: Spark Issue Type: Bug Components: ML, MLlib Affects Versions: 1.1.0, 1.2.0 Environment: Linux ubuntu 14.04 Reporter: Vivek Kulkarni First I tried to see if there was a bug raised before with similar trace. I found https://www.mail-archive.com/user@spark.apache.org/msg13708.html but the suggestion to upgarde to latest code bae ( I cloned from master branch) does not fix this issue. Issue: try to train a decision tree classifier on some data.After training and when it begins colllect, it crashes: 15/01/06 22:28:15 INFO BlockManagerMaster: Updated info of block rdd_52_1 15/01/06 22:28:15 ERROR Executor: Exception in task 1.0 in stage 31.0 (TID 1895) java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.spark.mllib.tree.impurity.GiniAggregator.update(Gini.scala:93) at org.apache.spark.mllib.tree.impl.DTStatsAggregator.update(DTStatsAggregator.scala:100) at org.apache.spark.mllib.tree.DecisionTree$.orderedBinSeqOp(DecisionTree.scala:419) at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$nodeBinSeqOp$1(DecisionTree.scala:511) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:536 ) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:533 ) at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:533) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628) at scala.collection.Iterator$class.foreach(Iterator.scala:727) Minimal code: data = MLUtils.loadLibSVMFile(sc, '/scratch1/vivek/datasets/private/a1a').cache() model = DecisionTree.trainClassifier(data, numClasses=2, categoricalFeaturesInfo={}, maxDepth=5, maxBins=100) Just download the data from: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a1a -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5119) java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model
[ https://issues.apache.org/jira/browse/SPARK-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270316#comment-14270316 ] Kai Sasaki commented on SPARK-5119: --- I think impurity implemented MLlib cannot keep negative labels. In this case it is -1. https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Gini.scala#L93 Should impurity support negative label? java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model --- Key: SPARK-5119 URL: https://issues.apache.org/jira/browse/SPARK-5119 Project: Spark Issue Type: Bug Components: ML, MLlib Affects Versions: 1.1.0, 1.2.0 Environment: Linux ubuntu 14.04 Reporter: Vivek Kulkarni First I tried to see if there was a bug raised before with similar trace. I found https://www.mail-archive.com/user@spark.apache.org/msg13708.html but the suggestion to upgarde to latest code bae ( I cloned from master branch) does not fix this issue. Issue: try to train a decision tree classifier on some data.After training and when it begins colllect, it crashes: 15/01/06 22:28:15 INFO BlockManagerMaster: Updated info of block rdd_52_1 15/01/06 22:28:15 ERROR Executor: Exception in task 1.0 in stage 31.0 (TID 1895) java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.spark.mllib.tree.impurity.GiniAggregator.update(Gini.scala:93) at org.apache.spark.mllib.tree.impl.DTStatsAggregator.update(DTStatsAggregator.scala:100) at org.apache.spark.mllib.tree.DecisionTree$.orderedBinSeqOp(DecisionTree.scala:419) at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$nodeBinSeqOp$1(DecisionTree.scala:511) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:536 ) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:533 ) at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) at org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:533) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628) at org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628) at scala.collection.Iterator$class.foreach(Iterator.scala:727) Minimal code: data = MLUtils.loadLibSVMFile(sc, '/scratch1/vivek/datasets/private/a1a').cache() model = DecisionTree.trainClassifier(data, numClasses=2, categoricalFeaturesInfo={}, maxDepth=5, maxBins=100) Just download the data from: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a1a -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org