I am writing my own map and reduce method for implementing K Means algorithm in 
Hadoop-1.0.1 in java language. Although i got some example link of K Means 
algorithm in Hadoop over blogs but i don't want to copy their code, as a lerner 
i want to implement it my self. So i just need some ideas/clues for the same. 
Below is the work which i already done.

I have Point and Cluster classes which are Writable, Point class have point x, 
point y and Cluster by whom this Point belongs. On the other hand my Cluster 
class has an ArrayList which stores all the Point objects which belongs to that 
Cluster. Cluseter class has an centroid variable also. Hope i am going correct 
(if not correct me please.)

Now first of all my input (which is a file, containing some points coordinates) 
must be provided to Point Objects. I mean this input file must be mapped to all 
the Point. This should be done ONCE in map class (but how?). After assigning 
some value to each Point, some random Cluster must be chosen at the initial 
phase (This must be done only ONCE, but how). Now every Point must be mapped to 
all the cluster with the distance between that point and centroid. In the 
reduce method, every Point will be checked and assigned to that Cluster which 
is nearest to that Point (by comparing the distance). Now new centroid is 
calculated in each Cluster (Should map and reduce be called recursively? if yes 
then where all the initialization part would go. Here by saying initialization 
i mean providing input to Point objects (which must be done ONCE initially) and 
choosing some random centroid (Initially we have to choose random centroid 
ONCE) ).
One more question, The value of parameter K(which will decide the total number 
of clusters should be assigned by user or hadoop will itself decide it?)

Somebody please explain me, i don't need the code, i want to write it myself. I 
need a way. Thank you.

-Ravi

Reply via email to