[ 
https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795137#action_12795137
 ] 

Jake Mannix commented on MAHOUT-220:
------------------------------------

bq. The extreme case is the DenseRandomizer. Every term gets spread out to 
every feature so you have collisions on every term on every feature. Because of 
the random weighting, you preserve enough information to allow effective 
learning.

Right, this is the use case in the stochastic decomposition case, cool.

bq. Should we generalize this concept to Vectorizer? The dictionary approach 
can accept a previously computed dictionary (possibly augmenting it on the fly) 
and might be called a DictionaryVectorizer or WeightedDictionaryVectorizer. At 
the level I have been working, the storage of the dictionary is an open 
question. The randomizers could inherit from the same basic interface (or 
abstract class).

Definitely.  

> Mahout Bayes Code cleanup
> -------------------------
>
>                 Key: MAHOUT-220
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-220
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.3
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>             Fix For: 0.3
>
>         Attachments: MAHOUT-BAYES.patch, MAHOUT-BAYES.patch
>
>
> Following isabel's checkstyle, I am adding a whole slew of code cleanup with 
> the following exceptions
> 1.  Line length used is 120 instead of 80. 
> 2.  static final log is kept as is. not LOG. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to