Hi,

My name is Aditya Nain, and I am a graduate student at University of
Florida.
I have been learning MADLib for a while and want to contribute to MADLib.
I went through some of the open stories in JIRA and started working on
MADLIB-410  :

https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB

which is about implementing Gaussian Mixture Model using Expectation
Maximization (EM) algorithm.

I came across the following paper while searching for distributed EM
algorithm which can be implemented in MADLib.

Carlos Ordonez, Paul Cereghini "SQLEM: fast clustering in SQL using the EM
algorithm" ACM SIGMOD Record, Volume 29 Issue 2, June 2000 Pages 559-570.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.7564

I thought of implementing the approach discussed in the paper, but the
paper makes an assumption that the covariance martix is the same for all
the clusters ( i.e covariance matrix is same for all the Gaussian
distributions). So, I wanted to know the opinion of the community if it's
fine to go with the assumption made in the paper and implement it in MADLib.

Also, currently MADLib doesn't have an implementation of a perceptron, nor
did I find any open story related to it in JIRA. I came across the
following paper, which talks about a distributed algorithm for perceptron :

Ryan McDonald, Keith Hall, Gideon Mann "Distributed training strategies for
the structured perceptron"
http://dl.acm.org/citation.cfm?id=1858068

Would it useful to have a distributed implementaion of perceptron in MADlib?

Thanks,
Aditya

Reply via email to