[ 
https://issues.apache.org/jira/browse/MAHOUT-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Eastman updated MAHOUT-251:
--------------------------------

    Attachment: MAHOUT-251.patch

This patch generalizes the 2-d dense models by introducing a new abstract 
VectorModelDistribution class with a modelPrototype field which is used during 
sampleFromPrior() to properly initialize the respective models. The models 
themselves have also had their 2-d limits removed though some of the pdf() 
computations are a bit suspect.

Models have always been Writable and this patch makes ModelDistributions 
Writable too since VectorModelDistributions need to be stateful. Several new 
tests have been added for writable serialization and other tests have been 
adjusted slightly. The DirichletDriver code is a work in progress as it needs 
additional parameters in order to be able to properly initialize the 
DirichletState's modelFactory. 

Well, the whole thing is a WIP, but the unit tests run and I'm interested in 
getting more eyeballs on the approach.

> Generalize Dirichlet models and model distributions to handle n-d and sparse 
> vectors
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-251
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-251
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.2
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>         Attachments: MAHOUT-251.patch
>
>
> Users attempting to use Dirichlet Process Clustering on real life problems 
> cannot use any of the existing models or model distributions as these have 
> hard-coded assumptions of a 2-d DenseVector underlying data representation. 
> These limitations are overly restrictive and the code needs to be 
> generatlized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to