Hello all,

I am running PFP on a fairly large dataset and it works well for smaller
subsets of the data.  However, once I attempt larger samples, I run into
this error in the reducer phase:

1)  10/01/19 00:25:35 INFO mapred.JobClient: Task Id : attempt_, Status :
FAILED
Task attempt_ failed to report status for 607 seconds. Killing!

I've also noticed that only one reducer is launched for the FP-Tree mining
phase.  
I've tried passing in -D mapred options but it doesn't seem like
PFPGrowthJob supports it.  Is there anyway I can increase the timeout, heap
size, and/or number of reducers without explicitly changing the code and
recompiling?

Also, from my understanding of the algorithm, as long as the number of
groups is higher than the number of features that are above min support,
each tree will be able to utilize the maximum available heap resources
because each feature will be guaranteed to be mined separately, is that
assumption correct?

Thanks!
--sej

p.s. 
the last few log lines outputted for the failed reducer:
2010-01-18 13:06:39,418 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique pruned
items 9091
2010-01-18 13:06:39,530 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: FPTree Building: Read
10000 Transactions
2010-01-18 13:06:39,649 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: FPTree Building: Read
20000 Transactions
2010-01-18 13:06:39,758 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: FPTree Building: Read
30000 Transactions
2010-01-18 13:06:39,774 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of Nodes in the FP
Tree: 40904
2010-01-18 13:06:39,775 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 3393


-- 
View this message in context: 
http://old.nabble.com/PFP---failed-to-report-status----of-reducers-tp27220725p27220725.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Reply via email to