I can assign this to you, but you need to have an account in https://issues.apache.org. If you already have an account, then please send your id - I wasn't able to find you just using your name.
On Mon, Mar 28, 2016 at 3:31 PM, Aditya Nain <adityana...@gmail.com> wrote: > Hi Rahul, > > Thanks for the reply! > > I am working on implementing Gaussian Mixture Model assuming that the > co-variance matrix is same for all the Gaussians. > The JIRA which deals GMM is MADBLIB-410: > https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB > > Can this be assigned to me, or how do I get it assigned to me? > > Thanks, > Aditya > > On Mon, Mar 21, 2016 at 3:41 PM, Rahul Iyer <ri...@pivotal.io> wrote: > > > Hi Aditya, > > > > Welcome to the MADlib community! > > > > Gaussian Mixture models is extrememly useful and we would heartily > welcome > > a contribution for it. The SQLEM paper might be oversimplifying the > > capabilities of the database (e.g. assuming there is no array type is > > unnecessary for Postgresql). You could speed things (both dev time and > > execution time) by writing some of the functions in C++. K-means is an > > example of how clustering is implemented. > > IMO, assuming the same covariance matrix is reasonable. We could extend > the > > capabilities after the initial implementation is complete. > > > > There was some work started a long time ago that built perceptrons using > > the convex framework (link <https://github.com/iyerr3/madlib/tree/mlp>). > > There are still some bugs in that code since the trained network isn't > > converging. You could start there or build a new module - either ways an > > MLP module is frequently demanded by the data science community. > > > > I would suggest starting with Gaussian mixtures and then moving to > > perceptrons if GMM work is completed. > > > > Feel free to ask questions on this forum. Looking forward to > collaborating > > with you. > > > > Best, > > Rahul > > > > On Thu, Mar 17, 2016 at 2:08 PM, Aditya Nain <adityana...@gmail.com> > > wrote: > > > > > Hi, > > > > > > My name is Aditya Nain, and I am a graduate student at University of > > > Florida. > > > I have been learning MADLib for a while and want to contribute to > MADLib. > > > I went through some of the open stories in JIRA and started working on > > > MADLIB-410 : > > > > > > > > > https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB > > > > > > which is about implementing Gaussian Mixture Model using Expectation > > > Maximization (EM) algorithm. > > > > > > I came across the following paper while searching for distributed EM > > > algorithm which can be implemented in MADLib. > > > > > > Carlos Ordonez, Paul Cereghini "SQLEM: fast clustering in SQL using the > > EM > > > algorithm" ACM SIGMOD Record, Volume 29 Issue 2, June 2000 Pages > 559-570. > > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.7564 > > > > > > I thought of implementing the approach discussed in the paper, but the > > > paper makes an assumption that the covariance martix is the same for > all > > > the clusters ( i.e covariance matrix is same for all the Gaussian > > > distributions). So, I wanted to know the opinion of the community if > it's > > > fine to go with the assumption made in the paper and implement it in > > > MADLib. > > > > > > Also, currently MADLib doesn't have an implementation of a perceptron, > > nor > > > did I find any open story related to it in JIRA. I came across the > > > following paper, which talks about a distributed algorithm for > > perceptron : > > > > > > Ryan McDonald, Keith Hall, Gideon Mann "Distributed training strategies > > for > > > the structured perceptron" > > > http://dl.acm.org/citation.cfm?id=1858068 > > > > > > Would it useful to have a distributed implementaion of perceptron in > > > MADlib? > > > > > > Thanks, > > > Aditya > > > > > >