[ https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795127#action_12795127 ]
Robin Anil commented on MAHOUT-220: ----------------------------------- I am not very clear what is happening there when two words have the same hash?. Arent we loosing out on a lot of information ? The one i am proposing is going to do exact numbering of the features. One thing my method suffer from is addition of new data. It will take another couple of M/R to create the new dictionary file, while preserving the old ids. Its cumbersome but doable. What is happening in a Randomizer approach. Since you are fixing the feature set size. The new hash ids will also change when that feature set size increase right? > Mahout Bayes Code cleanup > ------------------------- > > Key: MAHOUT-220 > URL: https://issues.apache.org/jira/browse/MAHOUT-220 > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.3 > Reporter: Robin Anil > Assignee: Robin Anil > Fix For: 0.3 > > Attachments: MAHOUT-BAYES.patch, MAHOUT-BAYES.patch > > > Following isabel's checkstyle, I am adding a whole slew of code cleanup with > the following exceptions > 1. Line length used is 120 instead of 80. > 2. static final log is kept as is. not LOG. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.