[ 
https://issues.apache.org/jira/browse/MAHOUT-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899710#action_12899710
 ] 

Jeff Eastman commented on MAHOUT-479:
-------------------------------------

I'm making some progress on integrating the clustering data structures. Here's 
how the top levels are shaping up:

{code}
// this defines the minimal Dirichlet model
public interface Model<O> extends Writable {
  double pdf(O x);
  void observe(O x);
  int count();
  void computeParameters();
  public Model<VectorWritable> sampleFromPosterior();
}

// this puts a face on a cluster
public interface Cluster {
  int getId();
  Vector getCenter();
  Vector getRadius();
  int getNumPoints();
  String asFormatString(String[] bindings);
  String asJsonString();
}

// here's the resulting Cluster class hierarchy
public abstract class AbstractCluster implements Cluster, Model<VectorWritable> 
{}
  public abstract class DistanceMeasureCluster extends AbstractCluster {}
    public class Canopy extends DistanceMeasureCluster {}
    public class Cluster extends DistanceMeasureCluster {}
      public class MeanShiftCanopy extends Cluster {}
      public class SoftCluster extendsCluster {}
  public class GaussianCluster extends AbstractCluster {}

// Note: all the current Dirichlet models can be subsumed by GaussianCluster or 
simple subclasses
{code}

There's a fair amount of cleanup to tests etc to do but I thought I'd post this 
for visibility.


> Streamline classification/ clustering data structures
> -----------------------------------------------------
>
>                 Key: MAHOUT-479
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-479
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering
>    Affects Versions: 0.1, 0.2, 0.3, 0.4
>            Reporter: Isabel Drost
>
> Opening this JIRA issue to collect ideas on how to streamline our 
> classification and clustering algorithms to make integration for users easier 
> as per mailing list thread http://markmail.org/message/pnzvrqpv5226twfs
> {quote}
> Jake and Robin and I were talking the other evening and a common lament was 
> that our classification (and clustering) stuff was all over the map in terms 
> of data structures.  Driving that to rest and getting those comments even 
> vaguely as plug and play as our much more advanced recommendation components 
> would be very, very helpful.
> {quote}
> This issue probably also realates to MAHOUT-287 (intention there is to make 
> naive bayes run on vectors as input).
> Ted, Jake, Robin: Would be great if someone of you could add a comment on 
> some of the issues you discussed "the other evening" and (if applicable) any 
> minor or major changes you think could help solve this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to