On Wed, Apr 6, 2011 at 12:38 PM, Ted Dunning <[email protected]> wrote:
> > > >> 3. Is there a place where I can find guidelines for code formatting >> specific >> to Mahout? Things like indentation, class names, comments etc. >> > > Lucene standard which is Sun standard with 2 space indentation. > > This page might help especially near the bottom: > https://cwiki.apache.org/MAHOUT/how-to-contribute.html > > Note to others, this wiki page has lots more information than the > how-to-contribute page that is linked from our main site. > > Thanks Ted, the Eclipse code style sheet proved very useful. Also, you feedback regarding the order of programming the driver, mapper, reducer, combiner gave me new insight. My goal now is to create an entire mock chain of driver-mapper-combiner-reducer for the new HMM functionality and write simple unit tests for them before the project formally begins. This should give me a good skeleton to work against during the summer. Here are my next line of questions: 1. Automatic and incremental build: Being new to maven, I'm a little confused. While updating the code, the Maven console in Eclipse reports either the auto build or the incremental build: 4/10/11 12:11:02 PM EDT: Maven Builder: AUTO_BUILD 4/10/11 12:11:02 PM EDT: Maven Builder: AUTO_BUILD 4/10/11 12:12:26 PM EDT: Maven Builder: AUTO_BUILD 4/10/11 12:13:02 PM EDT: Maven Builder: INCREMENTAL_BUILD These messages are triggered every time I make changes to the code. I looked into the pom.xml and it lists the 1.6 as the version of the Sun's javac to be used for compilation. As far as I know, the javac is not an incremental builder, however the Eclipse's compiler is. How is this possible? Also, what is the difference between AUTO_BUILD and INCREMENTAL_BUILD? 2. Package location for Map Reduce HMM training. I noticed that the Map Reduce implementations of the different classifiers are located under different MapReduce packages (o.a.m.classifier.bayes.mapreduce.bayes, o.a.m.classifier.bayes.mapreduce.cbayes), whereas the Map Reduce classes of k-means clustering are lumped under the o.a.m.clustering and no separate Map Reduce package is introduced. What is the convention here? Keeping in mind extensibility in future and overall architecture of Mahout's code, where should I place the new HMM Baum Welch Map Reduce code: o.a.m.classifier.sequencelearning.hmm or o.a.m.classifier.sequencelearning.hmm.mapreduce or o.a.m.classifier.sequencelearning.hmm.mapreduce.baumwelch or somewhere else... Dhruv
