[ https://issues.apache.org/jira/browse/MAHOUT-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Eastman updated MAHOUT-251: -------------------------------- Attachment: MAHOUT-251.patch This patch generalizes the 2-d dense models by introducing a new abstract VectorModelDistribution class with a modelPrototype field which is used during sampleFromPrior() to properly initialize the respective models. The models themselves have also had their 2-d limits removed though some of the pdf() computations are a bit suspect. Models have always been Writable and this patch makes ModelDistributions Writable too since VectorModelDistributions need to be stateful. Several new tests have been added for writable serialization and other tests have been adjusted slightly. The DirichletDriver code is a work in progress as it needs additional parameters in order to be able to properly initialize the DirichletState's modelFactory. Well, the whole thing is a WIP, but the unit tests run and I'm interested in getting more eyeballs on the approach. > Generalize Dirichlet models and model distributions to handle n-d and sparse > vectors > ------------------------------------------------------------------------------------ > > Key: MAHOUT-251 > URL: https://issues.apache.org/jira/browse/MAHOUT-251 > Project: Mahout > Issue Type: Improvement > Components: Clustering > Affects Versions: 0.2 > Reporter: Jeff Eastman > Assignee: Jeff Eastman > Attachments: MAHOUT-251.patch > > > Users attempting to use Dirichlet Process Clustering on real life problems > cannot use any of the existing models or model distributions as these have > hard-coded assumptions of a 2-d DenseVector underlying data representation. > These limitations are overly restrictive and the code needs to be > generatlized. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.