Re: Streaming KMeans clustering

Johannes Schulte Fri, 27 Dec 2013 12:56:43 -0800

I updated the repository (with the typo)

g...@github.com:baunz/cluster-comprarison.git

to include more logging information about the number of times the distance
measure calculation is triggered (which is the most expensive thing imo).
the factor of dist. measure calculations per point seen is about 40 at
streaming k-means and 10 for regular k-means (because there are 10
clusters).

This is of course dependent on the searchSize Parameter but i used the
default value of 2.

On Fri, Dec 27, 2013 at 6:54 PM, Isabel Drost-Fromm <isa...@apache.org>wrote:

>
> Hi Dan,
>
>
> On Fri, 27 Dec 2013 14:13:51 +0200
> Dan Filimon <dfili...@apache.org> wrote:
> > Thoughts?
>
> First of all - good to see you back on dev@ :)
>
> Seems a few people have run into these issues. As currently there is no
> high level documentation for the whole streaming kmeans implementation
> - would you mind writing up the limitation and advise you have for users
> of this algorithm? Doesn't need to be anything fancy - essentially a
> here's how you compute how much memory you need to run this, here's the
> limitations and the flags to deal with these, here's things that should
> be changed or fixed in a later iteration - unless your previous mail
> covers all of this already. This could safe people a few debugging
> cycles when getting started with this at scale.
>
> Feel free to get it into our web page (if you are short in time, just
> write something up using markdown, I can take over publishing it).
>
> Isabel
>

Re: Streaming KMeans clustering

Reply via email to