I think somewhere alone the line you've not specified your label column -- it's defaulting to "label" and it does not recognize it, or at least not as a binary or nominal attribute.
On Sun, Sep 6, 2015 at 5:47 AM, Terry Hole <hujie.ea...@gmail.com> wrote: > Hi, Experts, > > I followed the guide of spark ml pipe to test DecisionTreeClassifier on > spark shell with spark 1.4.1, but always meets error like following, do you > have any idea how to fix this? > > The error stack: > java.lang.IllegalArgumentException: DecisionTreeClassifier was given input > with invalid label column label, without the number of classes specified. > See StringIndexer. > at > org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:71) > at > org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:41) > at org.apache.spark.ml.Predictor.fit(Predictor.scala:90) > at org.apache.spark.ml.Predictor.fit(Predictor.scala:71) > at > org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:133) > at > org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:129) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:42) > at > scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:43) > at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:129) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53) > at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55) > at $iwC$$iwC$$iwC$$iwC.<init>(<console>:57) > at $iwC$$iwC$$iwC.<init>(<console>:59) > at $iwC$$iwC.<init>(<console>:61) > at $iwC.<init>(<console>:63) > at <init>(<console>:65) > at .<init>(<console>:69) > at .<clinit>(<console>) > at .<init>(<console>:7) > at .<clinit>(<console>) > at $print(<console>) > > The execute code is: > // Labeled and unlabeled instance types. > // Spark SQL can infer schema from case classes. > case class LabeledDocument(id: Long, text: String, label: Double) > case class Document(id: Long, text: String) > // Prepare training documents, which are labeled. > val training = sc.parallelize(Seq( > LabeledDocument(0L, "a b c d e spark", 1.0), > LabeledDocument(1L, "b d", 0.0), > LabeledDocument(2L, "spark f g h", 1.0), > LabeledDocument(3L, "hadoop mapreduce", 0.0))) > > // Configure an ML pipeline, which consists of three stages: tokenizer, > hashingTF, and lr. > val tokenizer = new Tokenizer().setInputCol("text").setOutputCol("words") > val hashingTF = new > HashingTF().setNumFeatures(1000).setInputCol(tokenizer.getOutputCol).setOutputCol("features") > val lr = new > DecisionTreeClassifier().setMaxDepth(5).setMaxBins(32).setImpurity("gini") > val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, lr)) > > // Error raises from the following line > val model = pipeline.fit(training.toDF) > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org