[ https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679174#comment-13679174 ]
Grant Ingersoll commented on MAHOUT-1147: ----------------------------------------- [~jp...@sussex.ac.uk] Do you happen to have a test case that verifies this? > CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random > matrix > ----------------------------------------------------------------------------------- > > Key: MAHOUT-1147 > URL: https://issues.apache.org/jira/browse/MAHOUT-1147 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.7 > Environment: Eclipse IDE > Java code base > CVB0Driver Class > setModelPaths(Job job, Path modelPath) - method > Reporter: Jack Pay > Assignee: Jake Mannix > Labels: bug, cvb, fix, suggestion > Fix For: 0.8 > > Attachments: MAHOUT-1147.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Problem: > When training doc/topic model no paths for the term/topic model found > (outputs null). > These paths are set using setModelPaths in CVB0Driver. > Reason for Problem: > Variety of Job instances call this method. > The Job is passed to the method instead of the Configuration object given to > the Job. > The configuration is retrieved from the Job instance itself. > I believe that this Configuration instance is a clone of the original. > This is a problem as the variable MODEL_PATHS is set on the clone which is > then discarded when the given Job is complete. > The original Configuration has no MODEL_PATHS String set and therefore > returns null. > The code stipulates that if it cannot find a model to use a new random > matrix. This happens every time as MODEL_PATHS is not set for the > Configuration instance used. > Solution: > Do not pass the Job to the setModels method, but pass the Configuration > instance passed into the method which created the Job. > i.e. > change from: > setModelPaths(Job job, Path modelPath) > to: > setModelPaths(Configuration conf, Path modelPath) > And change all calling methods accordingly (obviously). > So far what little testing I have done appears to solve this problem. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira