[ 
https://issues.apache.org/jira/browse/MAHOUT-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost updated MAHOUT-1047:
---------------------------------

    Attachment: MAHOUT-1047-Show-Leak.patch

Attached a hack^Wpatch that might help re-produce the issue:

com.carrotsearch.randomizedtesting-runner tracks threads spawned during unit 
tests. I added RandomizedTest as parent class to our Mahout Test class, marked 
it such to not complain about leaking threads (there are a few test classes 
that do not join threads), enabled thread leakage tracking only for the 
TestCVBModelTrainer.

The hack about the patch is adding the randomizedtesting-runner dependency to 
compile instead of to test scope in mahout/math.

To use the patch: make sure you have compiled Mahout trunk at least once with 
"mvn install", apply the patch with -p1. Than from the project root build at 
least mahout/math with "mvn --projects math install", after that execute the 
tweaked test with "mvn --projects core  -Dtest=TestCVBModelTrainer test". You 
should see complaints like the following in the output:

Tests in error: 
   » ThreadLeak 32 threads leaked from TEST scope at 
testInMemoryCVB0(org.apache...
   » ThreadLeak There are still zombie threads that couldn't be terminated:
   1...
   » ThreadLeak 32 threads leaked from SUITE scope at 
org.apache.mahout.clusteri...
   » ThreadLeak There are still zombie threads that couldn't be terminated:
   1...

                
> CVB hangs after completion
> --------------------------
>
>                 Key: MAHOUT-1047
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1047
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.7
>         Environment: Ubuntu
>            Reporter: seth boyles
>            Priority: Minor
>              Labels: cvb, lda
>             Fix For: 0.7, 0.8
>
>         Attachments: MAHOUT-1047-Show-Leak.patch
>
>
> After running the new LDA CVB implementation, it hangs and does not terminate 
> the process like every other time I run Mahout
> Terminal output:
> 12/07/19 11:38:49 INFO mapred.LocalJobRunner: 
> 12/07/19 11:38:49 INFO mapred.Task: Task 'attempt_local_0022_m_000000_0' done.
> 12/07/19 11:38:49 INFO mapred.JobClient:  map 100% reduce 0%
> 12/07/19 11:38:49 INFO mapred.JobClient: Job complete: job_local_0022
> 12/07/19 11:38:49 INFO mapred.JobClient: Counters: 8
> 12/07/19 11:38:49 INFO mapred.JobClient:   File Output Format Counters 
> 12/07/19 11:38:49 INFO mapred.JobClient:     Bytes Written=2247793
> 12/07/19 11:38:49 INFO mapred.JobClient:   File Input Format Counters 
> 12/07/19 11:38:49 INFO mapred.JobClient:     Bytes Read=1920337
> 12/07/19 11:38:49 INFO mapred.JobClient:   FileSystemCounters
> 12/07/19 11:38:49 INFO mapred.JobClient:     FILE_BYTES_READ=1342812616
> 12/07/19 11:38:49 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1326092302
> 12/07/19 11:38:49 INFO mapred.JobClient:   Map-Reduce Framework
> 12/07/19 11:38:49 INFO mapred.JobClient:     Map input records=2772
> 12/07/19 11:38:49 INFO mapred.JobClient:     Spilled Records=0
> 12/07/19 11:38:49 INFO mapred.JobClient:     SPLIT_RAW_BYTES=140
> 12/07/19 11:38:49 INFO mapred.JobClient:     Map output records=2772
> 12/07/19 11:38:49 INFO driver.MahoutDriver: Program took 4089950 ms (Minutes: 
> 68.16583333333334)
> $MAHOUT_HOME/mahout cvb -i 
> /home/seth/Scripted/mahout_data/vectors/vectors/vectors-for-cvb/ -o 
> /home/seth/Scripted/mahout_data/clusters/ -ow -k 90 -dt 
> /home/seth/Scripted/mahout_data/distributions -dict 
> /home/seth/Scripted/mahout_data/vectors/vectors/dictionary.file-0 -mt 
> /home/seth/Scripted/mahout_data/temp/ -x 20 -cd 0.05

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to