Parallelization of Baum-Welch Algorithm for HMM Training 
---------------------------------------------------------

                 Key: MAHOUT-627
                 URL: https://issues.apache.org/jira/browse/MAHOUT-627
             Project: Mahout
          Issue Type: New Feature
          Components: Classification
    Affects Versions: 0.4
            Reporter: Dhruv Kumar
            Priority: Minor


Among the unsupervised learning methods available in the current sequential 
implementation of HMM training (MAHOUT-396), the Baum-Welch (BW) algorithm is 
an attractive candidate for a parallel, MapReduce implementation. Although 
slower than the Viterbi training algorithm, the BW is more numerically stable 
and provides guaranteed discovery of Maximum Likelihood Estimator (MLE). In 
Viterbi training, the MLE is approximated in order to reduce computation time.

A parallel, MapReduce implementation of BW will allow scalable model learning 
over large data sets. The resulting model can be used for prediction using the 
current sequential implementation of the Viterbi decoding algorithm. 

BW is a general case of the Expectation Maximization (EM) algorithm and it is 
shown that all EM algorithms are candidates for a MapReduce implementation: 
http://www-cs.stanford.edu/~ang/papers/nips06-mapreducemulticore.pdf. Also, the 
k-means clustering is an EM algorithm and has an existing MapReduce 
implementation in Mahout which hints at the possibility of code reuse to aid 
parallel BW development. The other possibility for a parallel implementation 
would be to use the notion of matrix probability as shown by Turin: 
http://tagh.de/tom/wp-content/uploads/turin-1998.pdf. 

Although non trivial, a parallel BW implementation would greatly add to the 
existing set of Mahout's HMM functionality. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to