[ 
https://issues.apache.org/jira/browse/MAHOUT-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved MAHOUT-629.
------------------------------

    Resolution: Invalid

Looks interesting, but we can't act on this without a current patch.
                
> FP Growth performance improvement
> ---------------------------------
>
>                 Key: MAHOUT-629
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-629
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Jaroslaw Odzga
>            Assignee: Robin Anil
>         Attachments: FPGrowth.java, performance_improvement.txt
>
>
> Instead of calculating patterns ending with given attribute multiple times 
> they can be calculated just once. Depending on for how many features patterns 
> are generated, speedup can be huge. More feature included - greater speedup. 
> For test data set (88162 real life 'basket' transactions), if all features 
> were selected (i.e. we want to generate patterns for all items in 
> transactions), patterns generation time dropped from 1h 15min to 8sec. For 
> parallel fpgrowth, where the number of requested features is limited the 
> speedup is not that dramatic, but still noticeable. Basically work done is 
> always smaller than before the patch (as patterns for each item are 
> calculated at most once).
> The improvement is a variation of base algorithm in situation where we want 
> to generate patterns for only subset of items (let's call this subset A). 
> Given that items are ordered by descending frequency it is enough to 
> calculate only patterns ending on any item with frequency smaller or equal to 
> the most frequent item in the subset A. The heaps for each item are 
> initialized upfront and merged after processing every item.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to