[jira] [Updated] (SPARK-12163) FPGrowth unusable on some datasets without extensive tweaking of the support threshold

Hyukjin Kwon (JIRA) Mon, 20 May 2019 21:49:27 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-12163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon updated SPARK-12163:
---------------------------------
    Labels: bulk-closed  (was: )

> FPGrowth unusable on some datasets without extensive tweaking of the support 
> threshold
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-12163
>                 URL: https://issues.apache.org/jira/browse/SPARK-12163
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Jaroslav Kuchar
>            Priority: Minor
>              Labels: bulk-closed
>
> This problem occurs on standard machine learning UCI datasets. 
> Details for "audiology" dataset follows: It contains only 226 transactions 
> and 70 attributes. Mining of frequent itemsets with support threshold 0.95 
> will produce 73.162.705 itemsets., for support 0.94 – 366.880.771 itemsets.
> More details about experiment: 
> https://gist.github.com/jaroslav-kuchar/edbcbe72c5a884136db1
> The number of generated itemsets rapidly growths with a number of unique 
> items in transactions. Considering the combinatorial explosion, it can cause 
> performing CPU-intensive and long running tasks for various settings of the 
> support threshold. This extensive tweaking of the support threshold makes the 
> usage of the FPGrowth implementation unusable even for a small dataset.
> It would be useful to implement additional stopping criterions to control the 
> explosion of itemsets’ count in FPGrowth. We propose to implement optional 
> limit for maximum number of generated itemsets or maximum number of items per 
> itemset.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-12163) FPGrowth unusable on some datasets without extensive tweaking of the support threshold

Reply via email to