[
https://issues.apache.org/jira/browse/SPARK-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denis Dus updated SPARK-6143:
-----------------------------
Description:
It is more convenient for person to analyze closed forms of frequent itemsets
(and patterns in general).
An itemset X is closed in data set X if there exist no proper super-itemset Y
such that Y has same support as X in D. So, closed frequent itemsets is just
lossless compression of all frequent itemsets.
A naive approach is to find all frequent itemsets and then remove each of them
which is a proper subset of existing frequent itemset and has the same support.
But it can be very costly as generation of all frequent itemsets is still
needed.
The more powerful idea is to use some kind of merging while mining process.
I've heard about FPClose algorithm based on FPGrowth:
[http://users.encs.concordia.ca/~grahne/papers/fimi03.pdf] (Section 4 in paper)
I think, that it can be more useful for MLLib users.
was:
It is more convenient for person to analyze closed forms of frequent itemsets
(and patterns in general).
An itemset X is closed in data set X if there exist no proper super-itemset Y
such that Y has same support as X in D. So, closed frequent itemsets is just
lossless compression of all frequent itemsets.
A naive approach is to find all frequent itemsets and then remove each of them
which is a proper subset of existing frequent itemset and has the same support.
But it can be very costly as generation of all frequent itemsets is still
needed.
The more powerful idea is to use some kind of merging while mining process.
I've heard about FPClose algorithm based on FPGrowth:
[http://users.encs.concordia.ca/~grahne/papers/fimi03.pdf] (Section 4 in paper)
> Improve FP-Growth for mining closed-forms of frequent patterns
> --------------------------------------------------------------
>
> Key: SPARK-6143
> URL: https://issues.apache.org/jira/browse/SPARK-6143
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Reporter: Denis Dus
> Priority: Minor
>
> It is more convenient for person to analyze closed forms of frequent itemsets
> (and patterns in general).
> An itemset X is closed in data set X if there exist no proper super-itemset Y
> such that Y has same support as X in D. So, closed frequent itemsets is just
> lossless compression of all frequent itemsets.
> A naive approach is to find all frequent itemsets and then remove each of
> them which is a proper subset of existing frequent itemset and has the same
> support. But it can be very costly as generation of all frequent itemsets is
> still needed.
> The more powerful idea is to use some kind of merging while mining process.
> I've heard about FPClose algorithm based on FPGrowth:
> [http://users.encs.concordia.ca/~grahne/papers/fimi03.pdf] (Section 4 in
> paper)
> I think, that it can be more useful for MLLib users.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]