Re: How KMeans clustering works in Mahout 0.8?

2014-01-28 Thread Suneel Marthi
All of Mahout's clustering algos can be run in both MR and non-MR mode.
By default its the MR mode that's executed unless the user chooses the non-MR 
mode by specifiying '-xm sequential' while invoking the driver.





On Wednesday, January 29, 2014 1:09 AM, Saeed Adel Mehraban 
 wrote:
 
Thank you for the details.
So KMeans could be run in both map-reduce and non-mapreduce version and the
decision will be made in driver, yes?



On Tue, Jan 28, 2014 at 11:12 PM, Yoonmin Nam  wrote:

> K-Means clustering works by using mahout.clustering.iterator package.
>
> In the iterator package, there is a classes called "CIMapper and
> CIReducer."
>
> Both of  them are used when we choose a method (-xm) as a mapreduce.
>
> Then, this mapper and reducer works for you!
>
> Thanks.
>
> -Original Message-
> From: Saeed Adel Mehraban [mailto:s.ade...@gmail.com]
> Sent: Wednesday, January 29, 2014 4:35 AM
> To: user@mahout.apache.org
> Subject: Re: How KMeans clustering works in Mahout 0.8?
>
> I see the package, but I couldn't find anything related to map-reduce. I
> wonder why!
>
>
> On Tue, Jan 28, 2014 at 4:14 AM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
> > In the source code you could take a look in the
> > org.apache.mahout.clustering.kmeans package to get a start, if you
> > want to understand the implementation.
> >
> > If you just want to run some clustering, take a look at
> > examples/bin/cluster-reuters.sh which has an option to run kmeans.
> >
> >
> > On Mon, Jan 27, 2014 at 5:51 AM, Saeed Adel Mehraban
> >  > >wrote:
> >
> > > I read Mahout KMeans Design of implementation and it seems to be
> > > clear
> > wrt
> > > map-reduce paradigm. But when I refer to source code, I can not find
> > > the mapper, reducer, combiner or almost anything mentioned in the
> > > official website. What happened here and what I need to do to
> > > understand KMeans implementation of Mahout?
> > >
> >
>
>
>
>
>
>

Re: How KMeans clustering works in Mahout 0.8?

2014-01-28 Thread Saeed Adel Mehraban
Thank you for the details.
So KMeans could be run in both map-reduce and non-mapreduce version and the
decision will be made in driver, yes?


On Tue, Jan 28, 2014 at 11:12 PM, Yoonmin Nam  wrote:

> K-Means clustering works by using mahout.clustering.iterator package.
>
> In the iterator package, there is a classes called "CIMapper and
> CIReducer."
>
> Both of  them are used when we choose a method (-xm) as a mapreduce.
>
> Then, this mapper and reducer works for you!
>
> Thanks.
>
> -Original Message-
> From: Saeed Adel Mehraban [mailto:s.ade...@gmail.com]
> Sent: Wednesday, January 29, 2014 4:35 AM
> To: user@mahout.apache.org
> Subject: Re: How KMeans clustering works in Mahout 0.8?
>
> I see the package, but I couldn't find anything related to map-reduce. I
> wonder why!
>
>
> On Tue, Jan 28, 2014 at 4:14 AM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
> > In the source code you could take a look in the
> > org.apache.mahout.clustering.kmeans package to get a start, if you
> > want to understand the implementation.
> >
> > If you just want to run some clustering, take a look at
> > examples/bin/cluster-reuters.sh which has an option to run kmeans.
> >
> >
> > On Mon, Jan 27, 2014 at 5:51 AM, Saeed Adel Mehraban
> >  > >wrote:
> >
> > > I read Mahout KMeans Design of implementation and it seems to be
> > > clear
> > wrt
> > > map-reduce paradigm. But when I refer to source code, I can not find
> > > the mapper, reducer, combiner or almost anything mentioned in the
> > > official website. What happened here and what I need to do to
> > > understand KMeans implementation of Mahout?
> > >
> >
>
>
>
>
>
>


Re: Mahout 0.9 Release

2014-01-28 Thread Andrew Musselman
Looks good.

+1


On Tue, Jan 28, 2014 at 8:07 PM, Andrew Palumbo  wrote:

> a), b), c), d) all passed here.
>
> CosineDistance of clustered points from cluster-reuters.sh ->1 kmeans were
> within the range [0,1].
>
> > Date: Tue, 28 Jan 2014 16:45:42 -0800
> > From: suneel_mar...@yahoo.com
> > Subject: Mahout 0.9 Release
> > To: user@mahout.apache.org; d...@mahout.apache.org
> >
> > Fixed the issues that were reported with Clustering code this past week,
> upgraded codebase to Lucene 4.6.1 that was released today.
> >
> > Here's the URL for the 0.9 release in staging:-
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1004/org/apache/mahout/mahout-distribution/0.9/
> >
> > The artifacts have been signed with the following key:
> > https://people.apache.org/keys/committer/smarthi.asc
> >
> > Please:-
> > a) Verify that u can unpack the release (tar or zip)
> > b) Verify u r able to compile the distro
> > c)  Run through the unit tests: mvn clean test
> > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >
> > Need a minimum of 3 '+1' votes from PMC for the release to be finalized.
>
>


Mahout 0.9 Release

2014-01-28 Thread Suneel Marthi
Fixed the issues that were reported with Clustering code this past week, 
upgraded codebase to Lucene 4.6.1 that was released today.

Here's the URL for the 0.9 release in staging:-
https://repository.apache.org/content/repositories/orgapachemahout-1004/org/apache/mahout/mahout-distribution/0.9/

The artifacts have been signed with the following key:
https://people.apache.org/keys/committer/smarthi.asc

Please:-
a) Verify that u can unpack the release (tar or zip)
b) Verify u r able to compile the distro
c)  Run through the unit tests: mvn clean test
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through 
all the different options in each script.

Need a minimum of 3 '+1' votes from PMC for the release to be finalized.

Why is Mean-Shift Clustering Deprecated?

2014-01-28 Thread robpd
Hi

Using mahout-0.8 right now and noticed that mean-shift clustering is
deprecated.  Only thing I could find on it is at
https://issues.apache.org/jira/browse/MAHOUT-1250.  This seems to say that
it is not being used much - right? Any other reasons it's deprecated (e.g.
reliability)?

Main question is what could replace this useful technique in which one does
not have to know a priori the number of clusters to form?  I guess, after
mean-shift clustering is finally removed from the codebase it will be
possible to achieve the same effect using, first, canopy-clustering to
discover the number of clusters then a KMeans thereafter. Correct?

thks.

Rob



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-is-Mean-Shift-Clustering-Deprecated-tp4114065.html
Sent from the Mahout User List mailing list archive at Nabble.com.


RE: How KMeans clustering works in Mahout 0.8?

2014-01-28 Thread Yoonmin Nam
K-Means clustering works by using mahout.clustering.iterator package.

In the iterator package, there is a classes called "CIMapper and CIReducer."

Both of  them are used when we choose a method (-xm) as a mapreduce.

Then, this mapper and reducer works for you!

Thanks.

-Original Message-
From: Saeed Adel Mehraban [mailto:s.ade...@gmail.com] 
Sent: Wednesday, January 29, 2014 4:35 AM
To: user@mahout.apache.org
Subject: Re: How KMeans clustering works in Mahout 0.8?

I see the package, but I couldn't find anything related to map-reduce. I
wonder why!


On Tue, Jan 28, 2014 at 4:14 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> In the source code you could take a look in the 
> org.apache.mahout.clustering.kmeans package to get a start, if you 
> want to understand the implementation.
>
> If you just want to run some clustering, take a look at 
> examples/bin/cluster-reuters.sh which has an option to run kmeans.
>
>
> On Mon, Jan 27, 2014 at 5:51 AM, Saeed Adel Mehraban 
>  >wrote:
>
> > I read Mahout KMeans Design of implementation and it seems to be 
> > clear
> wrt
> > map-reduce paradigm. But when I refer to source code, I can not find 
> > the mapper, reducer, combiner or almost anything mentioned in the 
> > official website. What happened here and what I need to do to 
> > understand KMeans implementation of Mahout?
> >
>







Re: How KMeans clustering works in Mahout 0.8?

2014-01-28 Thread Suneel Marthi
Look at KMeansDriver.java in the specified package and trace thru the code.
You should see both MR and non-MR versions of kmeans impl.





On Tuesday, January 28, 2014 2:35 PM, Saeed Adel Mehraban  
wrote:
 
I see the package, but I couldn't find anything related to map-reduce. I
wonder why!



On Tue, Jan 28, 2014 at 4:14 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> In the source code you could take a look in the
> org.apache.mahout.clustering.kmeans package to get a start, if you want to
> understand the implementation.
>
> If you just want to run some clustering, take a look at
> examples/bin/cluster-reuters.sh which has an option to run kmeans.
>
>
> On Mon, Jan 27, 2014 at 5:51 AM, Saeed Adel Mehraban  >wrote:
>
> > I read Mahout KMeans Design of implementation and it seems to be clear
> wrt
> > map-reduce paradigm. But when I refer to source code, I can not find the
> > mapper, reducer, combiner or almost anything mentioned in the official
> > website. What happened here and what I need to do to understand KMeans
> > implementation of Mahout?
> >
>

Re: How KMeans clustering works in Mahout 0.8?

2014-01-28 Thread Saeed Adel Mehraban
I see the package, but I couldn't find anything related to map-reduce. I
wonder why!


On Tue, Jan 28, 2014 at 4:14 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> In the source code you could take a look in the
> org.apache.mahout.clustering.kmeans package to get a start, if you want to
> understand the implementation.
>
> If you just want to run some clustering, take a look at
> examples/bin/cluster-reuters.sh which has an option to run kmeans.
>
>
> On Mon, Jan 27, 2014 at 5:51 AM, Saeed Adel Mehraban  >wrote:
>
> > I read Mahout KMeans Design of implementation and it seems to be clear
> wrt
> > map-reduce paradigm. But when I refer to source code, I can not find the
> > mapper, reducer, combiner or almost anything mentioned in the official
> > website. What happened here and what I need to do to understand KMeans
> > implementation of Mahout?
> >
>


Re: Classify Handwritten Digits

2014-01-28 Thread Chameera Wijebandara
Thanks Ted.

Thanks,
Chameera


On Fri, Jan 24, 2014 at 11:12 PM, Ted Dunning  wrote:

> You can also put out lots of clusters and use cluster membership as the
> features for a classifier.
>
> There was a discussion here (or possibly on the dev@mahout list) on this
> topic several weeks ago.  Search the archives for "iris" and my name.
>
>
>
> On Fri, Jan 24, 2014 at 8:46 AM, Angus Macnab  >wrote:
>
> > You can do supervised learning by outputing the clusters and labeling
> them
> > 0-9.
> >
> > > On Jan 23, 2014, at 10:34 PM, Tharindu Rusira <
> tharindurus...@gmail.com>
> > wrote:
> > >
> > > On Fri, Jan 24, 2014 at 9:50 AM, Angus Macnab  > >wrote:
> > >
> > >> This is a pretty classic machine learning problem and can be handled
> > with
> > >> several different algorithms.  Logistic regression is the obvious
> > choice,
> > >> but clustering algorithms will work fine also.  Just decompose the
> > pixels
> > >> into a really long vector and train your algorithm with the
> input-output
> > >> pairs.  You can get 100% accuracy on this pretty easily if you are
> > careful
> > >> with your bias-variance decomposition.  This is a fun one for neural
> > >> networks too!
> > >>
> > >> Essentially any machine learning book will delve into greater detail
> on
> > >> this as the US postal digit data has been around for a long time.  I
> > think
> > >> Kaggle even had this as a training exercise for a while, so there's
> > >> probably a ton of discussion of various methods and algorithms on
> their
> > >> message boards.
> > >>
> > >> For kicks why don't you compare k-means clustering to logistic
> > regression
> > >> using Mahout?
> > > Hi Angus, Chameera's requirement is to classify handwritten digits, so
> > > could you please explain how could K-means clustering be helpful in
> this
> > > scenario? Of course it would find different clusters but this is still
> a
> > > classification problem. Please correct me if I'm wrong.
> > >
> > > Thanks,
> > >
> > >
> > >>
> > >> -Angus
> > >>
> > >>
> > >>
> > >>
> > >> On Thu, Jan 23, 2014 at 8:00 PM, Chameera Wijebandara <
> > >> chameerawijeband...@gmail.com> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> I am trying to classify handwritten digits using mahout
> classification.
> > >> Any
> > >>> suggestion to come up with good solution?
> > >>>
> > >>> --
> > >>> Thanks,
> > >>>Chameera
> > >
> > >
> > >
> > > --
> > > M.P. Tharindu Rusira Kumara
> > >
> > > Department of Computer Science and Engineering,
> > > University of Moratuwa,
> > > Sri Lanka.
> > > +94757033733
> > > www.tharindu-rusira.blogspot.com
> >
>