So this patch doesn't quite work, due to some scoping wrt the parent class member variable collisions. I've got it working after some simple modifications, but I think it can be fixed much more simply by keeping initialization in the parent class.
I'll post a patch later tonight. On Monday, June 10, 2013, Grant Ingersoll (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679858#comment-13679858] > > Grant Ingersoll commented on MAHOUT-1147: > ----------------------------------------- > > Do you see: > {code} > echo "Extracting Reuters" > $MAHOUT org.apache.lucene.benchmark.utils.ExtractReuters > ${WORK_DIR}/reuters-sgm ${WORK_DIR}/reuters-out > if [ "$HADOOP_HOME" != "" ] && [ "$MAHOUT_LOCAL" == "" ] ; then > echo "Copying Reuters data to Hadoop" > set +e > $HADOOP dfs -rmr ${WORK_DIR}/reuters-sgm > $HADOOP dfs -rmr ${WORK_DIR}/reuters-out > set -e > $HADOOP dfs -put ${WORK_DIR}/reuters-sgm ${WORK_DIR}/reuters-sgm > $HADOOP dfs -put ${WORK_DIR}/reuters-out ${WORK_DIR}/reuters-out > fi > {code} > > Also, I'm on #mahout on IRC if that helps us resolve this faster. > > > CVB Bug in CVB0Driver causes doc/topic distributions to be trained on > random matrix > > > ----------------------------------------------------------------------------------- > > > > Key: MAHOUT-1147 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1147 > > Project: Mahout > > Issue Type: Bug > > Components: Clustering > > Affects Versions: 0.7 > > Environment: Eclipse IDE > > Java code base > > CVB0Driver Class > > setModelPaths(Job job, Path modelPath) - method > > Reporter: Jack Pay > > Assignee: Jake Mannix > > Labels: bug, cvb, fix, suggestion > > Fix For: 0.8 > > > > Attachments: MAHOUT-1147.patch, MAHOUT-1147.patch > > > > Original Estimate: 24h > > Remaining Estimate: 24h > > > > Problem: > > When training doc/topic model no paths for the term/topic model found > (outputs null). > > These paths are set using setModelPaths in CVB0Driver. > > Reason for Problem: > > Variety of Job instances call this method. > > The Job is passed to the method instead of the Configuration object > given to the Job. > > The configuration is retrieved from the Job instance itself. > > I believe that this Configuration instance is a clone of the original. > > This is a problem as the variable MODEL_PATHS is set on the clone which > is then discarded when the given Job is complete. > > The original Configuration has no MODEL_PATHS String set and therefore > returns null. > > The code stipulates that if it cannot find a model to use a new random > matrix. This happens every time as MODEL_PATHS is not set for the > Configuration instance used. > > Solution: > > Do not pass the Job to the setModels method, but pass the Configuration > instance passed into the method which created the Job. > > i.e. > > change from: > > setModelPaths(Job job, Path modelPath) > > to: > > setModelPaths(Configuration conf, Path modelPath) > > And change all calling methods accordingly (obviously). > > So far what little testing I have done appears to solve this problem. > > > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira > -- -jake
