[ https://issues.apache.org/jira/browse/MAHOUT-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robin Anil resolved MAHOUT-221. ------------------------------- Resolution: Fixed Committed > Implementation of FP-Bonsai Pruning for fast pattern mining > ----------------------------------------------------------- > > Key: MAHOUT-221 > URL: https://issues.apache.org/jira/browse/MAHOUT-221 > Project: Mahout > Issue Type: New Feature > Components: Frequent Itemset/Association Rule Mining > Affects Versions: 0.2 > Reporter: Robin Anil > Assignee: Robin Anil > Fix For: 0.3 > > Attachments: MAHOUT-FPGROWTH.patch, MAHOUT-FPGROWTH.patch > > > FP Bonsai is a method to prune long chained FP-Trees for faster growth. > http://win.ua.ac.be/~adrem/bibrem/pubs/fpbonsai.pdf > This implementation also adds a transaction preprocessing map/reduce job > which converts a list of transactions {1, 2, 4, 5}, {1, 2, 3}, {1, 2} into a > tree structure and thus saves space during fpgrowth map/reduce > the tree formed from above is. For typical this improves the storage space by > a great amount and thus saves on time during shuffle and sort > (1,3) -> (2,3) | - (4,1) - (5,1) > (3,1) > Also added a reducer to PFPgrowth (not part of the original paper) which does > this compression and saves on space. > This patch also adds an example transaction dataset generator from flickr and > delicious data set > https://www.uni-koblenz.de/FB4/Institutes/IFI/AGStaab/Research/DataSets/PINTSExperimentsDataSets/ > Both of them are GIG of tag data. Where "date userid itemid tag" is given. > The example maker creates a transaction based on all the unique tags a user > has tagged on an item. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.