Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

KHATWANI PARTH BHARAT Sat, 20 May 2017 06:24:36 -0700

Hey Trevor,
I have completed the Kmeans code and will soon commit it as per
instructions which you have shared with me the other mail chain.



Best Regards
Parth

On Sat, May 20, 2017 at 2:29 AM, Trevor Grant <trevor.d.gr...@gmail.com>
wrote:

> Bumping this-
>
> Parth, is there anything we can do to assist you?
>
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Mon, Apr 24, 2017 at 9:34 PM, KHATWANI PARTH BHARAT <
> h2016...@pilani.bits-pilani.ac.in> wrote:
>
> > @Trevor and @Dmitriy
> >
> > Tough Bug in Aggregating Transpose is fixed. One issue is still left
> which
> > is causing hindrance in completing the KMeans Code
> > That issue is of Assigning the the Row Keys of The DRM with the "Closest
> > Cluster Index" found
> > Consider the Matrix of Data points given as follows
> >
> > {
> >    0 => {0:1.0,    1: 1.0,    2: 1.0,   3: 3.0}
> >    1 => {0:1.0,    1: 2.0,    2: 3.0,   3: 4.0}
> >    2 => {0:1.0,    1: 3.0,    2: 4.0,   3: 5.0}
> >    3 => {0:1.0,    1: 4.0,    2: 5.0,   3: 6.0}
> >   }
> > Now these are
> > 0 =>
> > 1 =>
> > 2 =>
> > 3 =>
> > the Row keys. Here Zeroth column(0) contains the values which will be
> used
> > the store the count of Points assigned to each cluster and Column 1 to 3
> > contains co-ordinates of the data points.
> >
> > So now after cluster assignment step of Kmeans algorithm which @Dmitriy
> has
> > Outlined in the beginning of this mail chain,
> >
> > the above Matrix should look like this(Assuming that the 0th and 1st data
> > points are assigned to the cluster with index 0 and 2nd and 3rd data
> points
> > are assigned to cluster with index 1)
> >
> >  {
> >    0 => {0:1.0,    1: 1.0,    2: 1.0,   3: 3.0}
> >    0 => {0:1.0,    1: 2.0,    2: 3.0,   3: 4.0}
> >    1 => {0:1.0,    1: 3.0,    2: 4.0,   3: 5.0}
> >    1 => {0:1.0,    1: 4.0,    2: 5.0,   3: 6.0}
> >  }
> >
> > to achieve above mentioned result i using following code lines of code
> >
> > //11. Iterating over the Data Matrix(in DrmLike[Int] format)
> > dataDrmX.mapBlock() {
> >   case (keys, block) =>
> >     for (row <- 0 until block.nrow) {
> >          var dataPoint = block(row, ::)
> >
> >          //12. findTheClosestCentriod find the closest centriod to the
> Data
> > point specified by "dataPoint"
> >          val closesetIndex = findTheClosestCentriod(dataPoint,
> centriods)
> >
> >          //13. assigning closest index to key
> >          keys(row) = closesetIndex
> >      }
> >      keys -> block
> > }
> >
> > But it turns out to be
> >
> >  {
> >    0 => {0:1.0,    1: 2.0,    2: 3.0,   3: 4.0}
> >    1 => {0:1.0,    1: 4.0,    2: 5.0,   3: 6.0}
> >  }
> >
> >
> > So is there any thing wrong with the syntax of the above code.I am unable
> > to find any reference to the way in which i should assign a value to the
> > row keys.
> >
> > @Trevor as per what you have mentioned in the above mail chain
> > "Got it- in short no.
> >
> > Think of the keys like a dictionary or HashMap.
> >
> > That's why everything is ending up on row 1."
> >
> > But according to Algorithm outlined by@Dmitriy at start of the mail
> chain
> > we assign same key To Multiple Rows is possible.
> > Same is also mentioned in the Book Written by Dmitriy and Andrew.
> > It is mentioned that the rows having the same row keys summed up when we
> > take aggregating transpose.
> >
> > I now confused that weather it possible to achieve what i have mentioned
> > above or it is not possible to achieve or it is the Bug in the API.
> >
> >
> >
> > Thanks & Regards
> > Parth
> > <#m_33347126371020841_m_5688102708516554904_>
> >
>

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

Reply via email to