Re: Mahout KMeans clustering results

Arshad Khan Wed, 24 Feb 2010 22:27:51 -0800

Thanks for the help and explanation. :)

On Thu, Feb 25, 2010 at 1:20 PM, Jake Mannix <[email protected]> wrote:


> And to clarify: you can use either one, but you should think of them like
> this:
> RandomAccessSparseVector is useful for vectors whose contents change
> a great deal (the moving centroids of a clustering algorithm, for example),
> and SequentialAccessSparseVector are useful (ie faster) in the case where
> they are built up, and then are essentially used in an immutable fashion
> (you repeatedly compute a lot of dot-products and add multiples of them
> onto other vectors [either DenseVectors or RandomAccessSparseVectors]).
>
>  -jake
>
> On Wed, Feb 24, 2010 at 7:45 PM, Robin Anil <[email protected]> wrote:
>
> > They are replaced by the two impls RandomAccessSparseVector or
> > SequentialAccessSparseVector
> >
> >
> > On Thu, Feb 25, 2010 at 9:10 AM, Arshad Khan <[email protected]
> > >wrote:
> >
> > > Thanks for the quick reply.
> > >
> > > I have downloaded the latest 0.3 code. There seems to be significant
> > > changes
> > > in this version. For example, currently I am using
> > > org.apache.mahout.matrix.SparseVector class but in 0.3 I cannot find
> this
> > > class.
> > >
> > > What class it is replaced with?
> > >
> > > Thanks
> > >
> > > On Thu, Feb 25, 2010 at 10:12 AM, Ted Dunning <[email protected]>
> > > wrote:
> > >
> > > > There are known problems with that version of k-means.
> > > >
> > > > Try using the trunk version.  0.3 is very close and we are entering
> > code
> > > > freeze for that so you should be fine with the latest version.
> > > >
> > > > On Wed, Feb 24, 2010 at 5:46 PM, Arshad Khan <
> [email protected]
> > > > >wrote:
> > > >
> > > > > Hello
> > > > >
> > > > > I am using Mahout 0.2 implementation of KMeans in one of my Text
> > Mining
> > > > > project. I apply KMeans with a default K value of 4. It seems that
> > > every
> > > > > time I repeat the clustering process on the same data set, the
> > results
> > > > are
> > > > > different and difference (in terms of cluster size and membership)
> is
> > > > great
> > > > > from run to run. The initial set of centroid points are chosen
> > randomly
> > > > > through RandomSeedGenerator. Is there a way to obtain more
> consistent
> > > > > results that do not differ so greatly? Or may be I am doing
> something
> > > > > wrong?
> > > > >
> > > > > Any help or idea is very much appreciated.
> > > > >
> > > > > Thanks and Regards
> > > > > Arshad
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Ted Dunning, CTO
> > > > DeepDyve
> > > >
> > >
> >
>

Re: Mahout KMeans clustering results

Reply via email to