[ https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795139#action_12795139 ]
Robin Anil commented on MAHOUT-220: ----------------------------------- The current Bayes implementation is an island. if you skim through the training mechanism. Its a very optimised. (with least map/reduces) and the kind of information I store in hbase and in memory is very specific to that paper. First there is the weight, which is a matrix of feature as row and label as column and cell as the weight. Secondly, there is sum of cols and rows. put along with the weight matrix. Then there are special rows containing, the theta normalizer and alpha smoothing value etc. You can see its not really doing bayes rule. it is reproducing the math of CBayes paper. So I see noway of it direcly using the sgd model. We could have a Bayes Algo implementation specfic to the model you are training. If thats ok? > Mahout Bayes Code cleanup > ------------------------- > > Key: MAHOUT-220 > URL: https://issues.apache.org/jira/browse/MAHOUT-220 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.3 > Reporter: Robin Anil > Assignee: Robin Anil > Fix For: 0.3 > > Attachments: MAHOUT-BAYES.patch, MAHOUT-BAYES.patch > > > Following isabel's checkstyle, I am adding a whole slew of code cleanup with > the following exceptions > 1. Line length used is 120 instead of 80. > 2. static final log is kept as is. not LOG. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.