[ 
https://issues.apache.org/jira/browse/SPARK-38037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489225#comment-17489225
 ] 

zhengruifeng commented on SPARK-38037:
--------------------------------------

could you please provide a simple script to reproduce this issue?

> Spark MLlib FPGrowth not working with 40+ items in Frequent Item set
> --------------------------------------------------------------------
>
>                 Key: SPARK-38037
>                 URL: https://issues.apache.org/jira/browse/SPARK-38037
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 3.2.0
>         Environment: Stanalone Linux server
> 32 GB RAM
> 4 core
>  
>            Reporter: RJ
>            Priority: Major
>
> We have been using Spark FPGrowth and it works well with millions of 
> transactions (records) when the frequent items in the Frequent Itemset is 
> less than 25. Beyond 25 it runs into computational limit. For 40+ items in 
> the Frequent Itemset the process never return.
> To reproduce, you can create a simple data set of 3 transactions with equal 
> items (40 of them) and run FPgrowth with 0.9 support, the process never 
> completes. Below is a sample data I have used to narrow down the problem:
> |I1|I2|I3|I4|I5|I6|I7|I8|I9|I10|I11|I12|I13|I14|I15|I16|I17|I18|I19|I20|I21|I22|I23|I24|I25|I26|I27|I28|I29|I30|I31|I32|I33|I34|I35|I36|I37|I38|I39|I40|
> |I1|I2|I3|I4|I5|I6|I7|I8|I9|I10|I11|I12|I13|I14|I15|I16|I17|I18|I19|I20|I21|I22|I23|I24|I25|I26|I27|I28|I29|I30|I31|I32|I33|I34|I35|I36|I37|I38|I39|I40|
> |I1|I2|I3|I4|I5|I6|I7|I8|I9|I10|I11|I12|I13|I14|I15|I16|I17|I18|I19|I20|I21|I22|I23|I24|I25|I26|I27|I28|I29|I30|I31|I32|I33|I34|I35|I36|I37|I38|I39|I40|
>  
> While the computation grows (2^n -1) with each item in Frequent Itemset, it 
> surely should be able to handle 40 or more items in a Frequest Itemset
>  
> Is this a FPGrowth implementation limitation,
> are there any tuning parameters that I am missing? Thank you.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to