[ https://issues.apache.org/jira/browse/MAHOUT-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899710#action_12899710 ]
Jeff Eastman commented on MAHOUT-479: ------------------------------------- I'm making some progress on integrating the clustering data structures. Here's how the top levels are shaping up: {code} // this defines the minimal Dirichlet model public interface Model<O> extends Writable { double pdf(O x); void observe(O x); int count(); void computeParameters(); public Model<VectorWritable> sampleFromPosterior(); } // this puts a face on a cluster public interface Cluster { int getId(); Vector getCenter(); Vector getRadius(); int getNumPoints(); String asFormatString(String[] bindings); String asJsonString(); } // here's the resulting Cluster class hierarchy public abstract class AbstractCluster implements Cluster, Model<VectorWritable> {} public abstract class DistanceMeasureCluster extends AbstractCluster {} public class Canopy extends DistanceMeasureCluster {} public class Cluster extends DistanceMeasureCluster {} public class MeanShiftCanopy extends Cluster {} public class SoftCluster extendsCluster {} public class GaussianCluster extends AbstractCluster {} // Note: all the current Dirichlet models can be subsumed by GaussianCluster or simple subclasses {code} There's a fair amount of cleanup to tests etc to do but I thought I'd post this for visibility. > Streamline classification/ clustering data structures > ----------------------------------------------------- > > Key: MAHOUT-479 > URL: https://issues.apache.org/jira/browse/MAHOUT-479 > Project: Mahout > Issue Type: Improvement > Components: Classification, Clustering > Affects Versions: 0.1, 0.2, 0.3, 0.4 > Reporter: Isabel Drost > > Opening this JIRA issue to collect ideas on how to streamline our > classification and clustering algorithms to make integration for users easier > as per mailing list thread http://markmail.org/message/pnzvrqpv5226twfs > {quote} > Jake and Robin and I were talking the other evening and a common lament was > that our classification (and clustering) stuff was all over the map in terms > of data structures. Driving that to rest and getting those comments even > vaguely as plug and play as our much more advanced recommendation components > would be very, very helpful. > {quote} > This issue probably also realates to MAHOUT-287 (intention there is to make > naive bayes run on vectors as input). > Ted, Jake, Robin: Would be great if someone of you could add a comment on > some of the issues you discussed "the other evening" and (if applicable) any > minor or major changes you think could help solve this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.