The algorithm update is just broken into 2 steps: trainOn - to learn/update
the cluster centers, and predictOn - predicts cluster assignment on data

The StreamingKMeansExample you reference breaks up data into training and
test because you might want to score the predictions.  If you don't care
about that, you could just use a single stream for both steps.

On Thu, Aug 11, 2016 at 9:14 AM, Ahmed Sadek <don1...@gmail.com> wrote:

> Dear All,
>
> I was wondering why there is training data and testing data in kmeans ?
> Shouldn't it be unsupervised learning with just access to stream data ?
>
> I found similar question but couldn't understand the answer.
> http://stackoverflow.com/questions/30972057/is-the-
> streaming-k-means-clustering-predefined-in-mllib-library-of-spark-supervi
>
> Thanks!
> Ahmed
>

Reply via email to