[ https://issues.apache.org/jira/browse/MAHOUT-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved MAHOUT-629. ------------------------------ Resolution: Invalid Looks interesting, but we can't act on this without a current patch. > FP Growth performance improvement > --------------------------------- > > Key: MAHOUT-629 > URL: https://issues.apache.org/jira/browse/MAHOUT-629 > Project: Mahout > Issue Type: Improvement > Components: Frequent Itemset/Association Rule Mining > Affects Versions: 0.4, 0.5 > Reporter: Jaroslaw Odzga > Assignee: Robin Anil > Attachments: FPGrowth.java, performance_improvement.txt > > > Instead of calculating patterns ending with given attribute multiple times > they can be calculated just once. Depending on for how many features patterns > are generated, speedup can be huge. More feature included - greater speedup. > For test data set (88162 real life 'basket' transactions), if all features > were selected (i.e. we want to generate patterns for all items in > transactions), patterns generation time dropped from 1h 15min to 8sec. For > parallel fpgrowth, where the number of requested features is limited the > speedup is not that dramatic, but still noticeable. Basically work done is > always smaller than before the patch (as patterns for each item are > calculated at most once). > The improvement is a variation of base algorithm in situation where we want > to generate patterns for only subset of items (let's call this subset A). > Given that items are ordered by descending frequency it is enough to > calculate only patterns ending on any item with frequency smaller or equal to > the most frequent item in the subset A. The heaps for each item are > initialized upfront and merged after processing every item. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira