So this patch doesn't quite work, due to some scoping wrt the parent class
member variable collisions.  I've got it working after some simple
modifications, but I think it can be fixed much more simply by keeping
initialization in the parent class.

I'll post a patch later tonight.

On Monday, June 10, 2013, Grant Ingersoll (JIRA) wrote:

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679858#comment-13679858]
>
> Grant Ingersoll commented on MAHOUT-1147:
> -----------------------------------------
>
> Do you see:
> {code}
>     echo "Extracting Reuters"
>     $MAHOUT org.apache.lucene.benchmark.utils.ExtractReuters
> ${WORK_DIR}/reuters-sgm ${WORK_DIR}/reuters-out
>     if [ "$HADOOP_HOME" != "" ] && [ "$MAHOUT_LOCAL" == "" ] ; then
>         echo "Copying Reuters data to Hadoop"
>         set +e
>         $HADOOP dfs -rmr ${WORK_DIR}/reuters-sgm
>         $HADOOP dfs -rmr ${WORK_DIR}/reuters-out
>         set -e
>         $HADOOP dfs -put ${WORK_DIR}/reuters-sgm ${WORK_DIR}/reuters-sgm
>         $HADOOP dfs -put ${WORK_DIR}/reuters-out ${WORK_DIR}/reuters-out
>     fi
> {code}
>
> Also, I'm on #mahout on IRC if that helps us resolve this faster.
>
> > CVB Bug in CVB0Driver causes doc/topic distributions to be trained on
> random matrix
> >
> -----------------------------------------------------------------------------------
> >
> >                 Key: MAHOUT-1147
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1147
> >             Project: Mahout
> >          Issue Type: Bug
> >          Components: Clustering
> >    Affects Versions: 0.7
> >         Environment: Eclipse IDE
> > Java code base
> > CVB0Driver Class
> > setModelPaths(Job job, Path modelPath) - method
> >            Reporter: Jack Pay
> >            Assignee: Jake Mannix
> >              Labels: bug, cvb, fix, suggestion
> >             Fix For: 0.8
> >
> >         Attachments: MAHOUT-1147.patch, MAHOUT-1147.patch
> >
> >   Original Estimate: 24h
> >  Remaining Estimate: 24h
> >
> > Problem:
> > When training doc/topic model no paths for the term/topic model found
> (outputs null).
> > These paths are set using setModelPaths in CVB0Driver.
> > Reason for Problem:
> > Variety of Job instances call this method.
> > The Job is passed to the method instead of the Configuration object
> given to the Job.
> > The configuration is retrieved from the Job instance itself.
> > I believe that this Configuration instance is a clone of the original.
> > This is a problem as the variable MODEL_PATHS is set on the clone which
> is then discarded when the given Job is complete.
> > The original Configuration has no MODEL_PATHS String set and therefore
> returns null.
> > The code stipulates that if it cannot find a model to use a new random
> matrix. This happens every time as MODEL_PATHS is not set for the
> Configuration instance used.
> > Solution:
> > Do not pass the Job to the setModels method, but pass the Configuration
> instance passed into the method which created the Job.
> > i.e.
> > change from:
> > setModelPaths(Job job, Path modelPath)
> > to:
> > setModelPaths(Configuration conf, Path modelPath)
> > And change all calling methods accordingly (obviously).
> > So far what little testing I have done appears to solve this problem.
> >
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>


-- 

  -jake

Reply via email to