[ 
https://issues.apache.org/jira/browse/MAHOUT-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640832#comment-13640832
 ] 

Jake Mannix commented on MAHOUT-1047:
-------------------------------------

So in general, I think this is the right approach, but looking in a little bit 
reminds me that in mapreduce mode (as in the original bug report status output 
at the top), 
[code]
public void train(VectorIterable matrix, VectorIterable docTopicCounts) {
[code]
isn't used.  Instead, if you look in 

https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java

you see that the ModelTrainer is instantiated and start()'ed in mapper setup, 
then train() is called per document, then modelTrainer.stop() is called in 
cleanup.  So just like in this patch, where you call writeModel.stop(); this is 
probably necessary in the mapper too, or maybe a better place is inside of 
ModelTrainer.stop() - just make sure we delegate down the calls to stop() on 
the read/write models when the trainer is stopped too.
                
> CVB hangs after completion
> --------------------------
>
>                 Key: MAHOUT-1047
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1047
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.7
>         Environment: Ubuntu
>            Reporter: seth boyles
>            Priority: Minor
>              Labels: cvb, lda
>             Fix For: 0.7, 0.8
>
>         Attachments: MAHOUT-1047.patch, MAHOUT-1047-Show-Leak.patch
>
>
> After running the new LDA CVB implementation, it hangs and does not terminate 
> the process like every other time I run Mahout
> Terminal output:
> 12/07/19 11:38:49 INFO mapred.LocalJobRunner: 
> 12/07/19 11:38:49 INFO mapred.Task: Task 'attempt_local_0022_m_000000_0' done.
> 12/07/19 11:38:49 INFO mapred.JobClient:  map 100% reduce 0%
> 12/07/19 11:38:49 INFO mapred.JobClient: Job complete: job_local_0022
> 12/07/19 11:38:49 INFO mapred.JobClient: Counters: 8
> 12/07/19 11:38:49 INFO mapred.JobClient:   File Output Format Counters 
> 12/07/19 11:38:49 INFO mapred.JobClient:     Bytes Written=2247793
> 12/07/19 11:38:49 INFO mapred.JobClient:   File Input Format Counters 
> 12/07/19 11:38:49 INFO mapred.JobClient:     Bytes Read=1920337
> 12/07/19 11:38:49 INFO mapred.JobClient:   FileSystemCounters
> 12/07/19 11:38:49 INFO mapred.JobClient:     FILE_BYTES_READ=1342812616
> 12/07/19 11:38:49 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1326092302
> 12/07/19 11:38:49 INFO mapred.JobClient:   Map-Reduce Framework
> 12/07/19 11:38:49 INFO mapred.JobClient:     Map input records=2772
> 12/07/19 11:38:49 INFO mapred.JobClient:     Spilled Records=0
> 12/07/19 11:38:49 INFO mapred.JobClient:     SPLIT_RAW_BYTES=140
> 12/07/19 11:38:49 INFO mapred.JobClient:     Map output records=2772
> 12/07/19 11:38:49 INFO driver.MahoutDriver: Program took 4089950 ms (Minutes: 
> 68.16583333333334)
> $MAHOUT_HOME/mahout cvb -i 
> /home/seth/Scripted/mahout_data/vectors/vectors/vectors-for-cvb/ -o 
> /home/seth/Scripted/mahout_data/clusters/ -ow -k 90 -dt 
> /home/seth/Scripted/mahout_data/distributions -dict 
> /home/seth/Scripted/mahout_data/vectors/vectors/dictionary.file-0 -mt 
> /home/seth/Scripted/mahout_data/temp/ -x 20 -cd 0.05

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to