[ https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795122#action_12795122 ]
Ted Dunning commented on MAHOUT-220: ------------------------------------ Anil, See classifier.sgd.TermRandomizer (and implementations DenseRandomizer and BinaryRandomizer) for a term list to vector converter. These are in the MAHOUT-228 patch. It has the virtue of converting term lists to vectors of fixed size. It currently does not do term weighting, but that would be a very easy fix. The approach is roughly along the lines of http://arxiv.org/PS_cache/arxiv/pdf/0902/0902.2206v2.pdf or the stochastic decomposition work. If you like these, we can promote them to a common area under classifier. > Mahout Bayes Code cleanup > ------------------------- > > Key: MAHOUT-220 > URL: https://issues.apache.org/jira/browse/MAHOUT-220 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.3 > Reporter: Robin Anil > Assignee: Robin Anil > Fix For: 0.3 > > Attachments: MAHOUT-BAYES.patch, MAHOUT-BAYES.patch > > > Following isabel's checkstyle, I am adding a whole slew of code cleanup with > the following exceptions > 1. Line length used is 120 instead of 80. > 2. static final log is kept as is. not LOG. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.