Hi, Experts, I followed the guide of spark ml pipe <http://spark.apache.org/docs/latest/ml-guide.html> to test DecisionTreeClassifier on spark shell with spark 1.4.1, but always meets error like following, do you have any idea how to fix this?
The error stack: *java.lang.IllegalArgumentException: DecisionTreeClassifier was given input with invalid label column label, without the number of classes specified. See StringIndexer.* at org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:71) at org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:41) at org.apache.spark.ml.Predictor.fit(Predictor.scala:90) at org.apache.spark.ml.Predictor.fit(Predictor.scala:71) at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:133) at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:129) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:42) at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:43) at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:129) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:57) at $iwC$$iwC$$iwC.<init>(<console>:59) at $iwC$$iwC.<init>(<console>:61) at $iwC.<init>(<console>:63) at <init>(<console>:65) at .<init>(<console>:69) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) The execute code is: *// Labeled and unlabeled instance types.* *// Spark SQL can infer schema from case classes.* *case class LabeledDocument(id: Long, text: String, label: Double)* *case class Document(id: Long, text: String)* *// Prepare training documents, which are labeled.* *val training = sc.parallelize(Seq(* * LabeledDocument(0L, "a b c d e spark", 1.0),* * LabeledDocument(1L, "b d", 0.0),* * LabeledDocument(2L, "spark f g h", 1.0),* * LabeledDocument(3L, "hadoop mapreduce", 0.0)))* *// Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.* *val tokenizer = new Tokenizer().setInputCol("text").setOutputCol("words")* *val hashingTF = new HashingTF().setNumFeatures(1000).setInputCol(tokenizer.getOutputCol).setOutputCol("features")* *val lr = new DecisionTreeClassifier().setMaxDepth(5).setMaxBins(32).setImpurity("gini")* *val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, lr))* *// Error raises from the following line* *val model = pipeline.fit(training.toDF)*