[jira] Commented: (MAHOUT-577) RowSimilarityJob hangs during CooccurrencesMapper

Maya Hristakeva (JIRA) Thu, 06 Jan 2011 06:23:11 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978332#action_12978332
 ]


Maya Hristakeva commented on MAHOUT-577:
----------------------------------------

Joris and Sebastian, thanks for the comments. 

I have let the job run for more than 24 hours, and although the status changes 
the job never seems to pass any of the data to the reducer, although the Map 
output bytes grows very big. I've started a new instance of the job this 
morning and will let it run longer to see if it makes any more progress than 
before. I'll post again once the job has run for more than 24 hours. 

Also, the original matrix is very very sparse ~ 1%, and the same issue occurs 
regardless if I use the original sparse matrix or the one compressed by LDA 
which is ( 146682 x 5 ). 

Thanks, 
Maya

> RowSimilarityJob hangs during CooccurrencesMapper
> -------------------------------------------------
>
>                 Key: MAHOUT-577
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-577
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>         Environment: Linux Debian 5.0.5, 12GB Ram, Hadoop 20.3 installation 
>            Reporter: Maya Hristakeva
>            Priority: Blocker
>
> Hello,
> When trying to run a RowSimilarityJob on a matrix ( 146682 x 138351 ), the 
> job gets through the RowWeightMapper and WeightedOccurrencesPerColumnReducer, 
> and hangs during the CooccurrencesMapper although it shows that the map tasks 
> are 100% complete. 
> The command I use to run the job is: 
> hadoop jar mahout-core-0.4-job.jar 
> org.apache.mahout.math.hadoop.similarity.RowSimilarityJob 
> -Dmapred.input.dir=/user/maya.hristakeva/mahout/core4/tf/1/0.001/title/12_07_10/lda/5/lda-sim/ldaCompressedDocumentsMatrix
>  
> -Dmapred.output.dir=/user/maya.hristakeva/mahout/core4/tf/1/0.001/title/12_07_10/lda/5/lda-sim/ldaDocumentSimilarityMatrix
>  -Dmapred.reduce.tasks=8 -Dmapred.map.tasks=200 
> -Dmapred.job.name=LDA_ROW_SIMILARITY_TEST --tempDir 
> /user/maya.hristakeva/temp/lda/5 --numberOfColumns 138351 
> --similarityClassname 
> org.apache.mahout.math.hadoop.similarity.vector.DistributedEuclideanDistanceVectorSimilarity
>  --maxSimilaritiesPerRow 10
> And the output of the mappers which are 100% complete, but hanging is: 
> syslog logs
> 01-05 18:30:00,835 INFO org.apache.hadoop.mapred.MapTask: bufstart = 
> 29085149; bufend = 39038598; bufvoid = 99614720
> 2011-01-05 18:30:00,835 INFO org.apache.hadoop.mapred.MapTask: kvstart = 
> 65461; kvend = 327605; length = 327680
> 2011-01-05 18:30:06,241 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 94
> 2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: Spilling map 
> output: record full = true
> 2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: bufstart = 
> 39038598; bufend = 48983989; bufvoid = 99614720
> 2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: kvstart = 
> 327605; kvend = 262068; length = 327680
> 2011-01-05 18:30:14,528 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 95
> 2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: Spilling map 
> output: record full = true
> 2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: bufstart = 
> 48983989; bufend = 58929384; bufvoid = 99614720
> 2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: kvstart = 
> 262068; kvend = 196531; length = 327680
> 2011-01-05 18:30:22,615 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 96
> .
> .
> .
> This problem does not occur when I use a toy matrix of 100 x 100, but once I 
> give it the original matrix of ..... the problem is always reproducible. 
> Any ideas on what could be causing this? 
> Thanks, 
> Maya Hristakeva

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-577) RowSimilarityJob hangs during CooccurrencesMapper

Reply via email to