Mahout parallel K-Means - algorithms analysis‏

2014-03-15 Thread hiroshi leon
To whom it may correspond,

Hello, I have been checking the algorithm of Mahout 0.9 version k-means 
using MapReduce and I would like to know where can I check the code of 
what is happening inside the map function and in the reducer? 


I was debugging using NetBeans and I was not able to find what is exactly 
implemented in the Map and Reduce functions...



The reason what I am doing this is because I would like to know what 
is exactly implemented in the version of Mahout 0.9 in order to see 
which parts where optimized on the K-Means mapReduce algorithm.



Do you know  which research paper the Mahout K-means was based on or where can 
I read the pseudo code?



Thank you so much!



Best regards!

Hiroshi   

Re: Mahout parallel K-Means - algorithms analysis‏

2014-03-15 Thread Ted Dunning
We would love to help.

Can you say which program and which classes you are looking at?


On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon hiroshi_8...@hotmail.comwrote:

 To whom it may correspond,

 Hello, I have been checking the algorithm of Mahout 0.9 version k-means
 using MapReduce and I would like to know where can I check the code of
 what is happening inside the map function and in the reducer?


 I was debugging using NetBeans and I was not able to find what is exactly
 implemented in the Map and Reduce functions...



 The reason what I am doing this is because I would like to know what
 is exactly implemented in the version of Mahout 0.9 in order to see
 which parts where optimized on the K-Means mapReduce algorithm.



 Do you know  which research paper the Mahout K-means was based on or where
 can I read the pseudo code?



 Thank you so much!



 Best regards!

 Hiroshi


Re: Mahout parallel K-Means - algorithms analysis‏

2014-03-15 Thread Suneel Marthi
The clustering code is cimapper and cireducer.  Following the clustering, there 
is cluster classification which is mapper only.

Not sure about the reference paper, this stuffs been around for long but the 
documentation for kmeans on mahout.apache.org should explain the approach.

Sent from my iPhone

 On Mar 15, 2014, at 5:36 PM, hiroshi leon hiroshi_8...@hotmail.com wrote:
 
 Hello Ted,
 
 Thank you so much for your reply, the program that I was checking is the 
 KMeansDriver class with the run function,
 the buildCluster function in the same class and following the ClusterIterator 
 class with
 the iterateMR function. 
 
 I would like to know how where can I check the code that is implemented for 
 the mapper and the 
 reducer? is it in the CIMappper.class and CIReducer.class?
 
 Is there a research paper or pseudo-code in which Mahout parallel K-means was 
 based on?
 
 Thank you so much and have a nice day.
 
 Best regards
 
 
 From: ted.dunn...@gmail.com
 Date: Sat, 15 Mar 2014 13:56:56 -0700
 Subject: Re: Mahout parallel K-Means - algorithms analysis‏
 To: user@mahout.apache.org
 
 We would love to help.
 
 Can you say which program and which classes you are looking at?
 
 
 On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon 
 hiroshi_8...@hotmail.comwrote:
 
 To whom it may correspond,
 
 Hello, I have been checking the algorithm of Mahout 0.9 version k-means
 using MapReduce and I would like to know where can I check the code of
 what is happening inside the map function and in the reducer?
 
 
 I was debugging using NetBeans and I was not able to find what is exactly
 implemented in the Map and Reduce functions...
 
 
 
 The reason what I am doing this is because I would like to know what
 is exactly implemented in the version of Mahout 0.9 in order to see
 which parts where optimized on the K-Means mapReduce algorithm.
 
 
 
 Do you know  which research paper the Mahout K-means was based on or where
 can I read the pseudo code?
 
 
 
 Thank you so much!
 
 
 
 Best regards!
 
 Hiroshi
 


RE: Mahout parallel K-Means - algorithms analysis‏

2014-03-15 Thread hiroshi leon
Hello Ted,

Thank you so much for your reply, the program that I was checking is the 
KMeansDriver class with the run function,
the buildCluster function in the same class and following the ClusterIterator 
class with
the iterateMR function. 

I would like to know how where can I check the code that is implemented for the 
mapper and the 
reducer? is it in the CIMappper.class and CIReducer.class?

Is there a research paper or pseudo-code in which Mahout parallel K-means was 
based on?

Thank you so much and have a nice day.

Best regards


 From: ted.dunn...@gmail.com
 Date: Sat, 15 Mar 2014 13:56:56 -0700
 Subject: Re: Mahout parallel K-Means - algorithms analysis‏
 To: user@mahout.apache.org
 
 We would love to help.
 
 Can you say which program and which classes you are looking at?
 
 
 On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon 
 hiroshi_8...@hotmail.comwrote:
 
  To whom it may correspond,
 
  Hello, I have been checking the algorithm of Mahout 0.9 version k-means
  using MapReduce and I would like to know where can I check the code of
  what is happening inside the map function and in the reducer?
 
 
  I was debugging using NetBeans and I was not able to find what is exactly
  implemented in the Map and Reduce functions...
 
 
 
  The reason what I am doing this is because I would like to know what
  is exactly implemented in the version of Mahout 0.9 in order to see
  which parts where optimized on the K-Means mapReduce algorithm.
 
 
 
  Do you know  which research paper the Mahout K-means was based on or where
  can I read the pseudo code?
 
 
 
  Thank you so much!
 
 
 
  Best regards!
 
  Hiroshi