[CONF] Apache Mahout > Parallel Viterbi

confluence Sun, 11 Sep 2011 12:31:23 -0700

Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT)
Page: Parallel Viterbi 
(https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Viterbi)



Edited by Sergey Bartunov:
---------------------------------------------------------------------
Viterbi algorithm is known as inference algorithm (synonyms: segmentation, 
decoding etc) for Hidden Markov Model \[1\] which finds the most likely 
sequence of hidden states by given sequence of observed states.

Apache Mahout has both [sequential|Hidden Markov Models] and parallel (that's 
what you're reading about) implementations of the algorithm.

_This article is under the work_

h3. Parallelization strategy

is quite straightforward and based on data parallelizm. There are some studies 
on Viterbi (and Belief Propogation which is inference algorithm for loop-less 
Markov Random Fields and is quite similar to Viterbi) parallelization, but at 
the moment of writing this article none of them seem to be applyable for 
MapReduce paradigm.

Input sequences of observed variables are supposed to be divided into the 
chunks of some length, enough to store O(N*K) data in RAM. The algorithm 
process the data from chunk to chunk (that means from all chunks number one to 
all chunks number two and so on), performing forward and backward Viterbi 
passes.

h3. Usage

Run "bin/apache pviterbi" and see what it wants from you. That is:&nbsp;

* serialized HmmModel (i.e. by LossyHmmModelSerializer class)
* input data (observed sequences) in the format described above

*References*

# [Wikipedia article|http://en.wikipedia.org/wiki/Viterbi_algorithm]

Change your notification preferences: 
https://cwiki.apache.org/confluence/users/viewnotifications.action

[CONF] Apache Mahout > Parallel Viterbi

Reply via email to