[ 
https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795133#action_12795133
 ] 

Robin Anil commented on MAHOUT-220:
-----------------------------------

Anyways, I guess we are sounding like ML engineers here. This is a library, our 
job is to have options for people like us to debate over :). So lets agree upon 
a common mechanism. 

i.e Have different ways to create a term frequency vector. ie List<String> => 
SparseVector from documents. 

Once the SparseVector is created. Use uniform M/R jobs to do things like tfidf 
weighting, log likelihood(although i think we need the orginal file to get the 
co-occurrence and not the SparseVector)

Any ideas?






> Mahout Bayes Code cleanup
> -------------------------
>
>                 Key: MAHOUT-220
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-220
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.3
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>             Fix For: 0.3
>
>         Attachments: MAHOUT-BAYES.patch, MAHOUT-BAYES.patch
>
>
> Following isabel's checkstyle, I am adding a whole slew of code cleanup with 
> the following exceptions
> 1.  Line length used is 120 instead of 80. 
> 2.  static final log is kept as is. not LOG. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to