[ 
https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044975#comment-13044975
 ] 

Yarco Hayduk commented on MAHOUT-709:
-------------------------------------

Thank you for the suggestion. Frankly, I have not ran into a definition of "not 
strictly closed frequent item sets". The patterns are either closed, free or 
maximal (there are more definitions, these are only the most common ones). 
Hence, the implementation is flawed if its javadoc states that it outputs 
closed itemsets, and in reality it does not. 
I will verify that the decision to traverse the tree top down (at a certain 
level) leads to better performance and will investigate how to eliminate the 
issue of redundant itemsets in that scenario. 

Can you please elaborate on this line of code, as I am having a hard time 
understanding it:
"minSupportValue = Math.max(minSupportValue,minSupport.longValue() / 2);"
Why would we ever want to change the minsup? I assume that this idea comes from 
the 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12.9324&rep=rep1&type=pdf
 paper?

> FP-Growth Redundant patterns
> ----------------------------
>
>                 Key: MAHOUT-709
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-709
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4, 0.5
>            Reporter: Yarco Hayduk
>            Assignee: Robin Anil
>              Labels: fp-growth, frequent, parallel, pattern
>             Fix For: 0.6
>
>         Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns, 
> patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed. 
> I have tested Mahout's PFP-Growth algorithm with the 
> http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This 
> implementation has an option to generate closed patterns too. 
> When I filtered out the sub patterns from the output of Parallel FP-Growth I 
> arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to