[jira] [Commented] (SPARK-7337) FPGrowth algo throwing OutOfMemoryError
[ https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617939#comment-14617939 ] Amit Gupta commented on SPARK-7337: --- I look at your comment as workaround, when you say make minSupport to 1.0 then you are asking me to get item sequences appears in all the transactions and then work backwards to get upto the breaking point. Well I am not looking for workaround as I already have custom code written in core spark api which is working fine, its only FP api which breaks which I thought to try out. When Tree grows beyond RAM then it should spill over to disk rather then it should throw outofmemory. Try to use data in below site for recommendation/seq. of items, on single machine: https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data. > FPGrowth algo throwing OutOfMemoryError > --- > > Key: SPARK-7337 > URL: https://issues.apache.org/jira/browse/SPARK-7337 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.3.1 > Environment: Ubuntu >Reporter: Amit Gupta > Attachments: FPGrowthBug.png > > > When running FPGrowth algo with huge data in GBs and with numPartitions=500 > then after some time it throws OutOfMemoryError. > Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500 > tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails > to create 500 tasks and create some internal calculated 17 tasks. > Please refer to attachment - print screen. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7337) FPGrowth algo throwing OutOfMemoryError
[ https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617290#comment-14617290 ] Xiangrui Meng commented on SPARK-7337: -- [~amit.gupta.niit-tech] Any updates? > FPGrowth algo throwing OutOfMemoryError > --- > > Key: SPARK-7337 > URL: https://issues.apache.org/jira/browse/SPARK-7337 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.3.1 > Environment: Ubuntu >Reporter: Amit Gupta > Attachments: FPGrowthBug.png > > > When running FPGrowth algo with huge data in GBs and with numPartitions=500 > then after some time it throws OutOfMemoryError. > Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500 > tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails > to create 500 tasks and create some internal calculated 17 tasks. > Please refer to attachment - print screen. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7337) FPGrowth algo throwing OutOfMemoryError
[ https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601348#comment-14601348 ] Xiangrui Meng commented on SPARK-7337: -- How large is the `minSupport`? The number of frequent itemsets grows exponentially as minSupport decreases. So please start with a really large value (close to 1.0) and gradually reduce it. > FPGrowth algo throwing OutOfMemoryError > --- > > Key: SPARK-7337 > URL: https://issues.apache.org/jira/browse/SPARK-7337 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.3.1 > Environment: Ubuntu >Reporter: Amit Gupta > Attachments: FPGrowthBug.png > > > When running FPGrowth algo with huge data in GBs and with numPartitions=500 > then after some time it throws OutOfMemoryError. > Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500 > tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails > to create 500 tasks and create some internal calculated 17 tasks. > Please refer to attachment - print screen. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7337) FPGrowth algo throwing OutOfMemoryError
[ https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527874#comment-14527874 ] Amit Gupta commented on SPARK-7337: --- Yes, I know that if anything collected on driver which doesn't fit memory will fail. Here I am talking about below line: FPGrowthModel model = new FPGrowth() .setMinSupport(minSupport) .setNumPartitions(numPartition) .run(transactions); Where numPartition is >50 (i.e. 500). Please refer to print screen, one which is active stage (belongs to above API call) is throwing OutOfMemoryError. Now again referring to print screen, Stage just before active one (whose status is completed) has 500 tasks and Stage which is active has 17 tasks, instead it should have 500 tasks as I set numPartition as 500. And if you again refer to print screen, next Stage pending for execution again has 500 tasks. If you fix code around active Stage with 17 tasks to have tasks equal to numPartition then OutOfMemoryError issues will be fixed. > FPGrowth algo throwing OutOfMemoryError > --- > > Key: SPARK-7337 > URL: https://issues.apache.org/jira/browse/SPARK-7337 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.3.1 > Environment: Ubuntu >Reporter: Amit Gupta > Attachments: FPGrowthBug.png > > > When running FPGrowth algo with huge data in GBs and with numPartitions=500 > then after some time it throws OutOfMemoryError. > Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500 > tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails > to create 500 tasks and create some internal calculated 17 tasks. > Please refer to attachment - print screen. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7337) FPGrowth algo throwing OutOfMemoryError
[ https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527221#comment-14527221 ] Joseph K. Bradley commented on SPARK-7337: -- Do you know how many distinct items are being collected to the driver? If it's too many, then you'll have to increase minCount. If that's the issue, then this is not a bug, but a limitation of the implementation. > FPGrowth algo throwing OutOfMemoryError > --- > > Key: SPARK-7337 > URL: https://issues.apache.org/jira/browse/SPARK-7337 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.3.1 > Environment: Ubuntu >Reporter: Amit Gupta > Attachments: FPGrowthBug.png > > > When running FPGrowth algo with huge data in GBs and with numPartitions=500 > then after some time it throws OutOfMemoryError. > Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500 > tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails > to create 500 tasks and create some internal calculated 17 tasks. > Please refer to attachment - print screen. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7337) FPGrowth algo throwing OutOfMemoryError
[ https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526536#comment-14526536 ] Amit Gupta commented on SPARK-7337: --- I am running it in "local" mode. It should spill over hard-disk. I can clearly see that 500 tasks are not created by next stage hence OutOfMemoryError is coming. > FPGrowth algo throwing OutOfMemoryError > --- > > Key: SPARK-7337 > URL: https://issues.apache.org/jira/browse/SPARK-7337 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.3.1 > Environment: Ubuntu >Reporter: Amit Gupta > Attachments: FPGrowthBug.png > > > When running FPGrowth algo with huge data in GBs and with numPartitions=500 > then after some time it throws OutOfMemoryError. > Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500 > tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails > to create 500 tasks and create some internal calculated 17 tasks. > Please refer to attachment - print screen. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org