[ https://issues.apache.org/jira/browse/SPARK-38037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489225#comment-17489225 ]
zhengruifeng commented on SPARK-38037: -------------------------------------- could you please provide a simple script to reproduce this issue? > Spark MLlib FPGrowth not working with 40+ items in Frequent Item set > -------------------------------------------------------------------- > > Key: SPARK-38037 > URL: https://issues.apache.org/jira/browse/SPARK-38037 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 3.2.0 > Environment: Stanalone Linux server > 32 GB RAM > 4 core > > Reporter: RJ > Priority: Major > > We have been using Spark FPGrowth and it works well with millions of > transactions (records) when the frequent items in the Frequent Itemset is > less than 25. Beyond 25 it runs into computational limit. For 40+ items in > the Frequent Itemset the process never return. > To reproduce, you can create a simple data set of 3 transactions with equal > items (40 of them) and run FPgrowth with 0.9 support, the process never > completes. Below is a sample data I have used to narrow down the problem: > |I1|I2|I3|I4|I5|I6|I7|I8|I9|I10|I11|I12|I13|I14|I15|I16|I17|I18|I19|I20|I21|I22|I23|I24|I25|I26|I27|I28|I29|I30|I31|I32|I33|I34|I35|I36|I37|I38|I39|I40| > |I1|I2|I3|I4|I5|I6|I7|I8|I9|I10|I11|I12|I13|I14|I15|I16|I17|I18|I19|I20|I21|I22|I23|I24|I25|I26|I27|I28|I29|I30|I31|I32|I33|I34|I35|I36|I37|I38|I39|I40| > |I1|I2|I3|I4|I5|I6|I7|I8|I9|I10|I11|I12|I13|I14|I15|I16|I17|I18|I19|I20|I21|I22|I23|I24|I25|I26|I27|I28|I29|I30|I31|I32|I33|I34|I35|I36|I37|I38|I39|I40| > > While the computation grows (2^n -1) with each item in Frequent Itemset, it > surely should be able to handle 40 or more items in a Frequest Itemset > > Is this a FPGrowth implementation limitation, > are there any tuning parameters that I am missing? Thank you. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org