[
https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044975#comment-13044975
]
Yarco Hayduk commented on MAHOUT-709:
-------------------------------------
Thank you for the suggestion. Frankly, I have not ran into a definition of "not
strictly closed frequent item sets". The patterns are either closed, free or
maximal (there are more definitions, these are only the most common ones).
Hence, the implementation is flawed if its javadoc states that it outputs
closed itemsets, and in reality it does not.
I will verify that the decision to traverse the tree top down (at a certain
level) leads to better performance and will investigate how to eliminate the
issue of redundant itemsets in that scenario.
Can you please elaborate on this line of code, as I am having a hard time
understanding it:
"minSupportValue = Math.max(minSupportValue,minSupport.longValue() / 2);"
Why would we ever want to change the minsup? I assume that this idea comes from
the
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12.9324&rep=rep1&type=pdf
paper?
> FP-Growth Redundant patterns
> ----------------------------
>
> Key: MAHOUT-709
> URL: https://issues.apache.org/jira/browse/MAHOUT-709
> Project: Mahout
> Issue Type: Bug
> Components: Frequent Itemset/Association Rule Mining
> Affects Versions: 0.4, 0.5
> Reporter: Yarco Hayduk
> Assignee: Robin Anil
> Labels: fp-growth, frequent, parallel, pattern
> Fix For: 0.6
>
> Attachments: SixTransactions.dat, bresult-new.txt, dumpedPatterns,
> patterns-converted.txt
>
>
> The algorithm outputs more patterns that it is needed.
> I have tested Mahout's PFP-Growth algorithm with the
> http://www.borgelt.net/fpgrowth.html FP-Growth implementation. This
> implementation has an option to generate closed patterns too.
> When I filtered out the sub patterns from the output of Parallel FP-Growth I
> arrived to the same result, as in http://www.borgelt.net/fpgrowth.html
> Succinctly, you are not outputting closed items
> I am attaching the dummy DB along with the output of both algorithms
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira