[ https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795137#action_12795137 ]
Jake Mannix commented on MAHOUT-220: ------------------------------------ bq. The extreme case is the DenseRandomizer. Every term gets spread out to every feature so you have collisions on every term on every feature. Because of the random weighting, you preserve enough information to allow effective learning. Right, this is the use case in the stochastic decomposition case, cool. bq. Should we generalize this concept to Vectorizer? The dictionary approach can accept a previously computed dictionary (possibly augmenting it on the fly) and might be called a DictionaryVectorizer or WeightedDictionaryVectorizer. At the level I have been working, the storage of the dictionary is an open question. The randomizers could inherit from the same basic interface (or abstract class). Definitely. > Mahout Bayes Code cleanup > ------------------------- > > Key: MAHOUT-220 > URL: https://issues.apache.org/jira/browse/MAHOUT-220 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.3 > Reporter: Robin Anil > Assignee: Robin Anil > Fix For: 0.3 > > Attachments: MAHOUT-BAYES.patch, MAHOUT-BAYES.patch > > > Following isabel's checkstyle, I am adding a whole slew of code cleanup with > the following exceptions > 1. Line length used is 120 instead of 80. > 2. static final log is kept as is. not LOG. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.