[jira] [Commented] (SPARK-5119) java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model

2015-01-27 Thread Nicolas Garneau (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294046#comment-14294046
 ] 

Nicolas Garneau commented on SPARK-5119:


Hey guys, I am wondering what you think about letting the user control if its 
feature vectors are 0-based or 1-based. I used to have 0-based vectors for my 
datasets (worked a lot with scikit-learn) and I saw in the loadLibSVMFile 
function that you are converting any vectors to a 0-based...
Thought it would be cool to add a optional parameters or something...
Thanks guys, I'd be glad to give you some help :)

 java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree 
 model
 ---

 Key: SPARK-5119
 URL: https://issues.apache.org/jira/browse/SPARK-5119
 Project: Spark
  Issue Type: Bug
  Components: ML, MLlib
Affects Versions: 1.1.0, 1.2.0
 Environment: Linux ubuntu 14.04
Reporter: Vivek Kulkarni
Assignee: Kai Sasaki
 Fix For: 1.3.0


 First I tried to see if there was a bug raised before with similar trace. I 
 found https://www.mail-archive.com/user@spark.apache.org/msg13708.html but 
 the suggestion to upgarde to latest code bae ( I cloned from master branch) 
 does not fix this issue.
 Issue: try to train a decision tree classifier on some data.After training 
 and when it begins colllect, it crashes:
 15/01/06 22:28:15 INFO BlockManagerMaster: Updated info of block rdd_52_1
 15/01/06 22:28:15 ERROR Executor: Exception in task 1.0 in stage 31.0 (TID 
 1895)
 java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.spark.mllib.tree.impurity.GiniAggregator.update(Gini.scala:93)
 at 
 org.apache.spark.mllib.tree.impl.DTStatsAggregator.update(DTStatsAggregator.scala:100)
 at 
 org.apache.spark.mllib.tree.DecisionTree$.orderedBinSeqOp(DecisionTree.scala:419)
 at 
 org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$nodeBinSeqOp$1(DecisionTree.scala:511)
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:536
 )
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:533
 )
 at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
 at 
 org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:533)
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628)
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 Minimal code:
  data = MLUtils.loadLibSVMFile(sc, 
 '/scratch1/vivek/datasets/private/a1a').cache()
 model = DecisionTree.trainClassifier(data, numClasses=2, 
 categoricalFeaturesInfo={}, maxDepth=5, maxBins=100)
 Just download the data from: 
 http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a1a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5119) java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model

2015-01-09 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270825#comment-14270825
 ] 

Sean Owen commented on SPARK-5119:
--

Yes, the input must contain categories that are positive integers. I think this 
is a reasonable restriction. Although MLUtils.loadLibSVMFile will convert the 
1-based feature numbers to 0-based, it leaves the -1 / 1 target untouched. 
Simply map your -1 labels to 0.

 java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree 
 model
 ---

 Key: SPARK-5119
 URL: https://issues.apache.org/jira/browse/SPARK-5119
 Project: Spark
  Issue Type: Bug
  Components: ML, MLlib
Affects Versions: 1.1.0, 1.2.0
 Environment: Linux ubuntu 14.04
Reporter: Vivek Kulkarni

 First I tried to see if there was a bug raised before with similar trace. I 
 found https://www.mail-archive.com/user@spark.apache.org/msg13708.html but 
 the suggestion to upgarde to latest code bae ( I cloned from master branch) 
 does not fix this issue.
 Issue: try to train a decision tree classifier on some data.After training 
 and when it begins colllect, it crashes:
 15/01/06 22:28:15 INFO BlockManagerMaster: Updated info of block rdd_52_1
 15/01/06 22:28:15 ERROR Executor: Exception in task 1.0 in stage 31.0 (TID 
 1895)
 java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.spark.mllib.tree.impurity.GiniAggregator.update(Gini.scala:93)
 at 
 org.apache.spark.mllib.tree.impl.DTStatsAggregator.update(DTStatsAggregator.scala:100)
 at 
 org.apache.spark.mllib.tree.DecisionTree$.orderedBinSeqOp(DecisionTree.scala:419)
 at 
 org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$nodeBinSeqOp$1(DecisionTree.scala:511)
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:536
 )
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:533
 )
 at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
 at 
 org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:533)
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628)
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 Minimal code:
  data = MLUtils.loadLibSVMFile(sc, 
 '/scratch1/vivek/datasets/private/a1a').cache()
 model = DecisionTree.trainClassifier(data, numClasses=2, 
 categoricalFeaturesInfo={}, maxDepth=5, maxBins=100)
 Just download the data from: 
 http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a1a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5119) java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model

2015-01-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270959#comment-14270959
 ] 

Apache Spark commented on SPARK-5119:
-

User 'Lewuathe' has created a pull request for this issue:
https://github.com/apache/spark/pull/3975

 java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree 
 model
 ---

 Key: SPARK-5119
 URL: https://issues.apache.org/jira/browse/SPARK-5119
 Project: Spark
  Issue Type: Bug
  Components: ML, MLlib
Affects Versions: 1.1.0, 1.2.0
 Environment: Linux ubuntu 14.04
Reporter: Vivek Kulkarni

 First I tried to see if there was a bug raised before with similar trace. I 
 found https://www.mail-archive.com/user@spark.apache.org/msg13708.html but 
 the suggestion to upgarde to latest code bae ( I cloned from master branch) 
 does not fix this issue.
 Issue: try to train a decision tree classifier on some data.After training 
 and when it begins colllect, it crashes:
 15/01/06 22:28:15 INFO BlockManagerMaster: Updated info of block rdd_52_1
 15/01/06 22:28:15 ERROR Executor: Exception in task 1.0 in stage 31.0 (TID 
 1895)
 java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.spark.mllib.tree.impurity.GiniAggregator.update(Gini.scala:93)
 at 
 org.apache.spark.mllib.tree.impl.DTStatsAggregator.update(DTStatsAggregator.scala:100)
 at 
 org.apache.spark.mllib.tree.DecisionTree$.orderedBinSeqOp(DecisionTree.scala:419)
 at 
 org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$nodeBinSeqOp$1(DecisionTree.scala:511)
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:536
 )
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:533
 )
 at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
 at 
 org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:533)
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628)
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 Minimal code:
  data = MLUtils.loadLibSVMFile(sc, 
 '/scratch1/vivek/datasets/private/a1a').cache()
 model = DecisionTree.trainClassifier(data, numClasses=2, 
 categoricalFeaturesInfo={}, maxDepth=5, maxBins=100)
 Just download the data from: 
 http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a1a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5119) java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model

2015-01-08 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270316#comment-14270316
 ] 

Kai Sasaki commented on SPARK-5119:
---

I think impurity implemented MLlib cannot keep negative labels. In this case it 
is -1.
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Gini.scala#L93
Should impurity support negative label?

 java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree 
 model
 ---

 Key: SPARK-5119
 URL: https://issues.apache.org/jira/browse/SPARK-5119
 Project: Spark
  Issue Type: Bug
  Components: ML, MLlib
Affects Versions: 1.1.0, 1.2.0
 Environment: Linux ubuntu 14.04
Reporter: Vivek Kulkarni

 First I tried to see if there was a bug raised before with similar trace. I 
 found https://www.mail-archive.com/user@spark.apache.org/msg13708.html but 
 the suggestion to upgarde to latest code bae ( I cloned from master branch) 
 does not fix this issue.
 Issue: try to train a decision tree classifier on some data.After training 
 and when it begins colllect, it crashes:
 15/01/06 22:28:15 INFO BlockManagerMaster: Updated info of block rdd_52_1
 15/01/06 22:28:15 ERROR Executor: Exception in task 1.0 in stage 31.0 (TID 
 1895)
 java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.spark.mllib.tree.impurity.GiniAggregator.update(Gini.scala:93)
 at 
 org.apache.spark.mllib.tree.impl.DTStatsAggregator.update(DTStatsAggregator.scala:100)
 at 
 org.apache.spark.mllib.tree.DecisionTree$.orderedBinSeqOp(DecisionTree.scala:419)
 at 
 org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$nodeBinSeqOp$1(DecisionTree.scala:511)
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:536
 )
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:533
 )
 at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
 at 
 org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:533)
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628)
 at 
 org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 Minimal code:
  data = MLUtils.loadLibSVMFile(sc, 
 '/scratch1/vivek/datasets/private/a1a').cache()
 model = DecisionTree.trainClassifier(data, numClasses=2, 
 categoricalFeaturesInfo={}, maxDepth=5, maxBins=100)
 Just download the data from: 
 http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a1a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org