Hello all, I am running PFP on a fairly large dataset and it works well for smaller subsets of the data. However, once I attempt larger samples, I run into this error in the reducer phase:
1) 10/01/19 00:25:35 INFO mapred.JobClient: Task Id : attempt_, Status : FAILED Task attempt_ failed to report status for 607 seconds. Killing! I've also noticed that only one reducer is launched for the FP-Tree mining phase. I've tried passing in -D mapred options but it doesn't seem like PFPGrowthJob supports it. Is there anyway I can increase the timeout, heap size, and/or number of reducers without explicitly changing the code and recompiling? Also, from my understanding of the algorithm, as long as the number of groups is higher than the number of features that are above min support, each tree will be able to utilize the maximum available heap resources because each feature will be guaranteed to be mined separately, is that assumption correct? Thanks! --sej p.s. the last few log lines outputted for the failed reducer: 2010-01-18 13:06:39,418 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique pruned items 9091 2010-01-18 13:06:39,530 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: FPTree Building: Read 10000 Transactions 2010-01-18 13:06:39,649 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: FPTree Building: Read 20000 Transactions 2010-01-18 13:06:39,758 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: FPTree Building: Read 30000 Transactions 2010-01-18 13:06:39,774 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of Nodes in the FP Tree: 40904 2010-01-18 13:06:39,775 INFO org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all patterns with 3393 -- View this message in context: http://old.nabble.com/PFP---failed-to-report-status----of-reducers-tp27220725p27220725.html Sent from the Mahout User List mailing list archive at Nabble.com.
