Take note that the trunk version has FP-Bonsai implementation integrated. You will see substantial speed boost for long transactions>(10 items)
On Tue, Jan 19, 2010 at 10:28 AM, Robin Anil <[email protected]> wrote: > Are you running it on the trunk or the 0.2 release version ? > Robin > > On Tue, Jan 19, 2010 at 9:43 AM, sej <[email protected]> wrote: > >> >> Hello all, >> >> I am running PFP on a fairly large dataset and it works well for smaller >> subsets of the data. However, once I attempt larger samples, I run into >> this error in the reducer phase: >> >> 1) 10/01/19 00:25:35 INFO mapred.JobClient: Task Id : attempt_, Status : >> FAILED >> Task attempt_ failed to report status for 607 seconds. Killing! >> >> I've also noticed that only one reducer is launched for the FP-Tree mining >> phase. >> I've tried passing in -D mapred options but it doesn't seem like >> PFPGrowthJob supports it. Is there anyway I can increase the timeout, >> heap >> size, and/or number of reducers without explicitly changing the code and >> recompiling? >> > This wouldnt be the case unless you specify number of groups =1. Could you > give some idea about your dataset > > >> >> Also, from my understanding of the algorithm, as long as the number of >> groups is higher than the number of features that are above min support, >> each tree will be able to utilize the maximum available heap resources >> because each feature will be guaranteed to be mined separately, is that >> assumption correct? >> > No. Number of groups should be always lower than the number of features, > else it maxes out at the count of features as there would be no features > left to fill in the group. What is the number of features you are working > on. As a rule of thumb if the data is too large, try and keep 10-20 features > per group. So assign groups that way > > > >> Thanks! >> --sej >> >> p.s. >> the last few log lines outputted for the failed reducer: >> 2010-01-18 13:06:39,418 INFO >> org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique pruned >> items 9091 >> 2010-01-18 13:06:39,530 INFO >> org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: FPTree Building: Read >> 10000 Transactions >> 2010-01-18 13:06:39,649 INFO >> org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: FPTree Building: Read >> 20000 Transactions >> 2010-01-18 13:06:39,758 INFO >> org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: FPTree Building: Read >> 30000 Transactions >> 2010-01-18 13:06:39,774 INFO >> org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of Nodes in the >> FP >> Tree: 40904 >> 2010-01-18 13:06:39,775 INFO >> org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for >> all >> patterns with 3393 >> >> >> -- >> View this message in context: >> http://old.nabble.com/PFP---failed-to-report-status----of-reducers-tp27220725p27220725.html >> Sent from the Mahout User List mailing list archive at Nabble.com. >> >> > > > -- > ------ > Robin Anil > Blog: http://techdigger.wordpress.com > ------- > Try out Swipeball for iPhone > Video: http://www.youtube.com/watch?v=3hvEbWHciwU > iTunes: http://itunes.com/apps/swipeball > -- ------ Robin Anil Blog: http://techdigger.wordpress.com ------- Try out Swipeball for iPhone Video: http://www.youtube.com/watch?v=3hvEbWHciwU iTunes: http://itunes.com/apps/swipeball
