[jira] Commented: (MAHOUT-220) Mahout Bayes Code cleanup

Robin Anil (JIRA) Tue, 29 Dec 2009 12:31:57 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795139#action_12795139
 ]


Robin Anil commented on MAHOUT-220:
-----------------------------------

The current Bayes implementation is an island. if you skim through the training 
mechanism. Its a very optimised. (with least map/reduces) and the kind of 
information I store in hbase and in memory is very specific to that paper. 

First there is the weight, which is a matrix of feature as row and label as 
column and cell as the weight.
Secondly, there is sum of cols and rows. put along with the weight matrix. 
Then there are special rows containing, the theta normalizer and alpha 
smoothing value etc. 

 You can see its not really doing bayes rule. it is reproducing the math of 
CBayes paper.  So I see noway of it direcly using the sgd model. 

We could have a Bayes Algo implementation specfic to the model you are 
training.  If thats ok?

> Mahout Bayes Code cleanup
> -------------------------
>
>                 Key: MAHOUT-220
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-220
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.3
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>             Fix For: 0.3
>
>         Attachments: MAHOUT-BAYES.patch, MAHOUT-BAYES.patch
>
>
> Following isabel's checkstyle, I am adding a whole slew of code cleanup with 
> the following exceptions
> 1.  Line length used is 120 instead of 80. 
> 2.  static final log is kept as is. not LOG. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-220) Mahout Bayes Code cleanup

Reply via email to