[ 
https://issues.apache.org/jira/browse/MAHOUT-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861194#action_12861194
 ] 

Jeff Eastman edited comment on MAHOUT-297 at 4/26/10 9:15 PM:
--------------------------------------------------------------

I don't understand why the constructors for Canopy and KMeans Cluster were 
modified to override the given center vector types, as in:

{noformat}
   public Canopy(Vector point, int canopyId) {
     this.setId(canopyId);
-    this.setCenter(point.clone());
-    this.setPointTotal(point.clone());
+    this.setCenter(new RandomAccessSparseVector(point.clone()));
+    this.setPointTotal(getCenter().clone());
     this.setNumPoints(1);
   }
{noformat}

I can appreciate it might be a performance fix in some situations but forcing 
the center and total to be another type than that of the argument strikes me as 
bad practice. With input vectors of arbitrary type, shouldn't the clusters 
honor the contract to do their math over that type?

I'm -1 on this part of the patch.

      was (Author: jeastman):
    I don't understand why the constructors for Canopy and KMeans Cluster were 
modified to override the given center vector types, as in:

   public Canopy(Vector point, int canopyId) {
     this.setId(canopyId);
-    this.setCenter(point.clone());
-    this.setPointTotal(point.clone());
+    this.setCenter(new RandomAccessSparseVector(point.clone()));
+    this.setPointTotal(getCenter().clone());
     this.setNumPoints(1);
   }

I can appreciate it might be a performance fix in some situations but forcing 
the center and total to be another type than that of the argument strikes me as 
bad practice. With input vectors of arbitrary type, shouldn't the clusters 
honor the contract to do their math over that type?

I'm -1 on this part of the patch.
  
> Canopy and Kmeans clustering slows down on using SeqAccVector for center
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-297
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-297
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.4
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>         Attachments: MAHOUT-297.patch, MAHOUT-297.patch, MAHOUT-297.patch, 
> MAHOUT-297.patch, MAHOUT-297.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to