Denis Dus created SPARK-6143:
--------------------------------

             Summary: Improve FP-Growth for mining closed-forms of frequent 
patterns
                 Key: SPARK-6143
                 URL: https://issues.apache.org/jira/browse/SPARK-6143
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
            Reporter: Denis Dus
            Priority: Minor


It is more convenient for person to analyze closed forms of frequent itemsets 
(and patterns in general).
An itemset X is closed in data set X if there exist no proper super-itemset Y 
such that Y has same support as X in D. So, closed frequent itemsets is just 
lossless compression of all frequent itemsets.

A naive approach is to find all frequent itemsets and then remove each of them 
which is a proper subset of existing frequent itemset and has the same support. 
But it can be very costly as generation of all frequent itemsets is still 
needed.

The more powerful idea is to use some kind of merging while mining process. 
I've heard about FPClose algorithm based on FPGrowth:
[http://users.encs.concordia.ca/~grahne/papers/fimi03.pdf] (Section 4 in paper) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to