[ https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795133#action_12795133 ]
Robin Anil commented on MAHOUT-220: ----------------------------------- Anyways, I guess we are sounding like ML engineers here. This is a library, our job is to have options for people like us to debate over :). So lets agree upon a common mechanism. i.e Have different ways to create a term frequency vector. ie List<String> => SparseVector from documents. Once the SparseVector is created. Use uniform M/R jobs to do things like tfidf weighting, log likelihood(although i think we need the orginal file to get the co-occurrence and not the SparseVector) Any ideas? > Mahout Bayes Code cleanup > ------------------------- > > Key: MAHOUT-220 > URL: https://issues.apache.org/jira/browse/MAHOUT-220 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.3 > Reporter: Robin Anil > Assignee: Robin Anil > Fix For: 0.3 > > Attachments: MAHOUT-BAYES.patch, MAHOUT-BAYES.patch > > > Following isabel's checkstyle, I am adding a whole slew of code cleanup with > the following exceptions > 1. Line length used is 120 instead of 80. > 2. static final log is kept as is. not LOG. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.