Documentation error in MLlib - Clustering?

Emre Sevinc Fri, 13 Feb 2015 08:28:02 -0800

Hello,

I was trying the streaming kmeans clustering example in the official
documentation at:


   http://spark.apache.org/docs/1.2.0/mllib-clustering.html

But I've got a type error when I tried to compile the code:

[error]  found   :
org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.regression.LabeledPoint][error]
 required: org.apache.spark.streaming.dstream.DStream[(?,
org.apache.spark.mllib.linalg.Vector)][error]
model.predictOnValues(testData).print()[error]
  ^[error] one error found[error] (compile:compile) Compilation failed


And it seems like the solution is to use

   model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print
()

as shown in
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingKMeansExample.scala

instead of

   model.predictOnValues(testData).print()

as written in the documentation.

I just wanted to draw the attention to this, so that one of the maintainers
can fix the documentation.

-- 
Emre Sevinç

Documentation error in MLlib - Clustering?

Reply via email to