I think somewhere alone the line you've not specified your label
column -- it's defaulting to "label" and it does not recognize it, or
at least not as a binary or nominal attribute.

On Sun, Sep 6, 2015 at 5:47 AM, Terry Hole <hujie.ea...@gmail.com> wrote:
> Hi, Experts,
>
> I followed the guide of spark ml pipe to test DecisionTreeClassifier on
> spark shell with spark 1.4.1, but always meets error like following, do you
> have any idea how to fix this?
>
> The error stack:
> java.lang.IllegalArgumentException: DecisionTreeClassifier was given input
> with invalid label column label, without the number of classes specified.
> See StringIndexer.
>         at
> org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:71)
>         at
> org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:41)
>         at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
>         at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
>         at
> org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:133)
>         at
> org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:129)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
> scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:42)
>         at
> scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:43)
>         at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:129)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
>         at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
>         at $iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
>         at $iwC$$iwC$$iwC.<init>(<console>:59)
>         at $iwC$$iwC.<init>(<console>:61)
>         at $iwC.<init>(<console>:63)
>         at <init>(<console>:65)
>         at .<init>(<console>:69)
>         at .<clinit>(<console>)
>         at .<init>(<console>:7)
>         at .<clinit>(<console>)
>         at $print(<console>)
>
> The execute code is:
> // Labeled and unlabeled instance types.
> // Spark SQL can infer schema from case classes.
> case class LabeledDocument(id: Long, text: String, label: Double)
> case class Document(id: Long, text: String)
> // Prepare training documents, which are labeled.
> val training = sc.parallelize(Seq(
>   LabeledDocument(0L, "a b c d e spark", 1.0),
>   LabeledDocument(1L, "b d", 0.0),
>   LabeledDocument(2L, "spark f g h", 1.0),
>   LabeledDocument(3L, "hadoop mapreduce", 0.0)))
>
> // Configure an ML pipeline, which consists of three stages: tokenizer,
> hashingTF, and lr.
> val tokenizer = new Tokenizer().setInputCol("text").setOutputCol("words")
> val hashingTF = new
> HashingTF().setNumFeatures(1000).setInputCol(tokenizer.getOutputCol).setOutputCol("features")
> val lr =  new
> DecisionTreeClassifier().setMaxDepth(5).setMaxBins(32).setImpurity("gini")
> val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, lr))
>
> // Error raises from the following line
> val model = pipeline.fit(training.toDF)
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to