[ 
https://issues.apache.org/jira/browse/MAHOUT-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794677#action_12794677
 ] 

Robin Anil commented on MAHOUT-221:
-----------------------------------

Ping! Need a code review

> Implementation of FP-Bonsai Pruning for fast pattern mining
> -----------------------------------------------------------
>
>                 Key: MAHOUT-221
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-221
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.3
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>         Attachments: MAHOUT-FPGROWTH.patch, MAHOUT-FPGROWTH.patch
>
>
> FP Bonsai is a method to prune long chained FP-Trees for faster growth. 
> http://win.ua.ac.be/~adrem/bibrem/pubs/fpbonsai.pdf
> This implementation also adds a transaction preprocessing map/reduce job 
> which converts a list of transactions {1, 2, 4, 5}, {1, 2, 3}, {1, 2} into a 
> tree structure and thus saves space during fpgrowth map/reduce 
> the tree formed from above is. For typical this improves the storage space by 
> a great amount and thus saves on time during shuffle and sort
> (1,3) -> (2,3) | - (4,1) - (5,1)
>                       (3,1)        
> Also added a reducer to PFPgrowth (not part of the original paper) which does 
> this compression and saves on space. 
> This patch also adds an example transaction dataset generator from flickr and 
> delicious data set 
> https://www.uni-koblenz.de/FB4/Institutes/IFI/AGStaab/Research/DataSets/PINTSExperimentsDataSets/
> Both of them are GIG of tag data. Where "date userid itemid tag" is given. 
> The example maker creates a transaction based on all the unique tags a user 
> has tagged on an item. 
>          

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to