On Tue, Mar 22, 2011 at 1:15 AM, Robin Anil <[email protected]> wrote:
> You can discuss on the group and update the JIRA with concrete plans > As far as making the Baum Welch (BW) parallel goes, I have two broad approaches which I'm interested in: 1. As Ted had suggested, a lot of code in the existing K-Means implementation could be re-used to implement BW. This is because they're both Expectation Maximization algorithms. 2. Another very interesting possibility is to express the BW as a recursive join. There's a very interesting offshoot of Hadoop, called Haloop ( http://code.google.com/p/haloop/) which supports loop control, and caching of the intermediate results on the mapper inputs, reducer inputs and reducer outputs to improve performance. The paper [1] describes this in more detail. They have implemented k-means as a recursive join. In either case, I want to clearly define the scope and task list. BW will be the core of the project but: 1. Does it make sense for implementing the "counting method" for model discovery as well? It is clearly inferior but will it be a good reference for comparison to the BW. Any added benefit? 2. What has been the standard in the past GSoC Mahout projects regarding unit testing and documentation? In the meantime, I've been understanding more about Mahout, Map Reduce and Hadoop's internals. One of my course projects this semester is to implement the Bellman Iteration algorithm on Map Reduce and so far it has been coming along well. Any feedback is much appreciated. Dhruv
