RowSimilarityJob hangs during CooccurrencesMapper
-------------------------------------------------
Key: MAHOUT-577
URL: https://issues.apache.org/jira/browse/MAHOUT-577
Project: Mahout
Issue Type: Bug
Components: Collaborative Filtering
Affects Versions: 0.4
Environment: Linux Debian 5.0.5, 12GB Ram, Hadoop 20.3 installation
Reporter: Maya Hristakeva
Priority: Blocker
Hello,
When trying to run a RowSimilarityJob on a matrix ( 146682 x 138351 ), the job
gets through the RowWeightMapper and WeightedOccurrencesPerColumnReducer, and
hangs during the CooccurrencesMapper although it shows that the map tasks are
100% complete.
The command I use to run the job is:
hadoop jar mahout-core-0.4-job.jar
org.apache.mahout.math.hadoop.similarity.RowSimilarityJob
-Dmapred.input.dir=/user/maya.hristakeva/mahout/core4/tf/1/0.001/title/12_07_10/lda/5/lda-sim/ldaCompressedDocumentsMatrix
-Dmapred.output.dir=/user/maya.hristakeva/mahout/core4/tf/1/0.001/title/12_07_10/lda/5/lda-sim/ldaDocumentSimilarityMatrix
-Dmapred.reduce.tasks=8 -Dmapred.map.tasks=200
-Dmapred.job.name=LDA_ROW_SIMILARITY_TEST --tempDir
/user/maya.hristakeva/temp/lda/5 --numberOfColumns 138351 --similarityClassname
org.apache.mahout.math.hadoop.similarity.vector.DistributedEuclideanDistanceVectorSimilarity
--maxSimilaritiesPerRow 10
And the output of the mappers which are 100% complete, but hanging is:
syslog logs
01-05 18:30:00,835 INFO org.apache.hadoop.mapred.MapTask: bufstart = 29085149;
bufend = 39038598; bufvoid = 99614720
2011-01-05 18:30:00,835 INFO org.apache.hadoop.mapred.MapTask: kvstart = 65461;
kvend = 327605; length = 327680
2011-01-05 18:30:06,241 INFO org.apache.hadoop.mapred.MapTask: Finished spill 94
2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: Spilling map
output: record full = true
2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: bufstart =
39038598; bufend = 48983989; bufvoid = 99614720
2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: kvstart =
327605; kvend = 262068; length = 327680
2011-01-05 18:30:14,528 INFO org.apache.hadoop.mapred.MapTask: Finished spill 95
2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: Spilling map
output: record full = true
2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: bufstart =
48983989; bufend = 58929384; bufvoid = 99614720
2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: kvstart =
262068; kvend = 196531; length = 327680
2011-01-05 18:30:22,615 INFO org.apache.hadoop.mapred.MapTask: Finished spill 96
.
.
.
This problem does not occur when I use a toy matrix of 100 x 100, but once I
give it the original matrix of ..... the problem is always reproducible.
Any ideas on what could be causing this?
Thanks,
Maya Hristakeva
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.