[ 
https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795122#action_12795122
 ] 

Ted Dunning commented on MAHOUT-220:
------------------------------------


Anil,

See classifier.sgd.TermRandomizer (and implementations DenseRandomizer and 
BinaryRandomizer) for a term list to vector converter.  These are in the 
MAHOUT-228 patch.

It has the virtue of converting term lists to vectors of fixed size.  It 
currently does not do term weighting, but that would be a very easy fix.  The 
approach is roughly along the lines of 
http://arxiv.org/PS_cache/arxiv/pdf/0902/0902.2206v2.pdf or the stochastic 
decomposition work.

If you like these, we can promote them to a common area under classifier.

> Mahout Bayes Code cleanup
> -------------------------
>
>                 Key: MAHOUT-220
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-220
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.3
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>             Fix For: 0.3
>
>         Attachments: MAHOUT-BAYES.patch, MAHOUT-BAYES.patch
>
>
> Following isabel's checkstyle, I am adding a whole slew of code cleanup with 
> the following exceptions
> 1.  Line length used is 120 instead of 80. 
> 2.  static final log is kept as is. not LOG. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to