On Tue, Mar 22, 2011 at 1:15 AM, Robin Anil <[email protected]> wrote:

> You can discuss on the group and update the JIRA with concrete plans
>

As far as making the Baum Welch (BW) parallel goes, I have two broad
approaches which I'm interested in:

1. As Ted had suggested, a lot of code in the existing K-Means
implementation could be re-used to implement BW. This is because they're
both Expectation Maximization algorithms.

2. Another very interesting possibility is to express the BW as a recursive
join.  There's a very interesting offshoot of Hadoop, called Haloop (
http://code.google.com/p/haloop/) which supports loop control, and caching
of the intermediate results on the mapper inputs,  reducer inputs and
reducer outputs to improve performance. The paper [1] describes this in more
detail. They have implemented k-means as a recursive join.

In either case, I want to clearly define the scope and task list. BW will be
the core of the project but:

1. Does it make sense for implementing the "counting method" for model
discovery as well? It is clearly inferior but will it be a good reference
for comparison to the BW. Any added benefit?

2. What has been the standard in the past GSoC Mahout projects regarding
unit testing and documentation?

In the meantime, I've been understanding more about Mahout, Map Reduce and
Hadoop's internals. One of my course projects this semester is to implement
the Bellman Iteration algorithm on Map Reduce and so far it has been coming
along well.

Any feedback is much appreciated.

Dhruv

Reply via email to