[jira] [Commented] (SPARK-7337) FPGrowth algo throwing OutOfMemoryError

2015-07-07 Thread Amit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617939#comment-14617939
 ] 

Amit Gupta commented on SPARK-7337:
---

I look at your comment as workaround, when you say make minSupport to 1.0 then 
you are asking me to get item sequences appears in all the transactions and 
then work backwards to get upto the breaking point.
Well I am not looking for workaround as I already have custom code written in 
core spark api which is working fine, its only FP api which breaks which I 
thought to try out.
When Tree grows beyond RAM then it should spill over to disk rather then it 
should throw outofmemory.

Try to use data in below site for recommendation/seq. of items, on single 
machine: https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data.

> FPGrowth algo throwing OutOfMemoryError
> ---
>
> Key: SPARK-7337
> URL: https://issues.apache.org/jira/browse/SPARK-7337
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.3.1
> Environment: Ubuntu
>Reporter: Amit Gupta
> Attachments: FPGrowthBug.png
>
>
> When running FPGrowth algo with huge data in GBs and with numPartitions=500 
> then after some time it throws OutOfMemoryError.
> Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500 
> tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails 
> to create 500 tasks and create some internal calculated 17 tasks.
> Please refer to attachment - print screen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7337) FPGrowth algo throwing OutOfMemoryError

2015-07-07 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617290#comment-14617290
 ] 

Xiangrui Meng commented on SPARK-7337:
--

[~amit.gupta.niit-tech] Any updates?

> FPGrowth algo throwing OutOfMemoryError
> ---
>
> Key: SPARK-7337
> URL: https://issues.apache.org/jira/browse/SPARK-7337
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.3.1
> Environment: Ubuntu
>Reporter: Amit Gupta
> Attachments: FPGrowthBug.png
>
>
> When running FPGrowth algo with huge data in GBs and with numPartitions=500 
> then after some time it throws OutOfMemoryError.
> Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500 
> tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails 
> to create 500 tasks and create some internal calculated 17 tasks.
> Please refer to attachment - print screen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7337) FPGrowth algo throwing OutOfMemoryError

2015-06-25 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601348#comment-14601348
 ] 

Xiangrui Meng commented on SPARK-7337:
--

How large is the `minSupport`? The number of frequent itemsets grows 
exponentially as minSupport decreases. So please start with a really large 
value (close to 1.0) and gradually reduce it.

> FPGrowth algo throwing OutOfMemoryError
> ---
>
> Key: SPARK-7337
> URL: https://issues.apache.org/jira/browse/SPARK-7337
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.3.1
> Environment: Ubuntu
>Reporter: Amit Gupta
> Attachments: FPGrowthBug.png
>
>
> When running FPGrowth algo with huge data in GBs and with numPartitions=500 
> then after some time it throws OutOfMemoryError.
> Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500 
> tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails 
> to create 500 tasks and create some internal calculated 17 tasks.
> Please refer to attachment - print screen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7337) FPGrowth algo throwing OutOfMemoryError

2015-05-04 Thread Amit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527874#comment-14527874
 ] 

Amit Gupta commented on SPARK-7337:
---

Yes, I know that if anything collected on driver which doesn't fit memory will 
fail. Here I am talking about below line:

FPGrowthModel model = new FPGrowth()
.setMinSupport(minSupport)
.setNumPartitions(numPartition)
.run(transactions);

Where numPartition is >50 (i.e. 500). Please refer to print screen, one which 
is active stage (belongs to above API call) is throwing OutOfMemoryError. Now 
again referring to print screen, Stage just before active one (whose status is 
completed) has 500 tasks and Stage which is active has 17 tasks, instead it 
should have 500 tasks as I set numPartition as 500. And if you again refer to 
print screen, next Stage pending for execution again has 500 tasks. If you fix 
code around active Stage with 17 tasks to have tasks equal to numPartition then 
OutOfMemoryError issues will be fixed.

> FPGrowth algo throwing OutOfMemoryError
> ---
>
> Key: SPARK-7337
> URL: https://issues.apache.org/jira/browse/SPARK-7337
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.3.1
> Environment: Ubuntu
>Reporter: Amit Gupta
> Attachments: FPGrowthBug.png
>
>
> When running FPGrowth algo with huge data in GBs and with numPartitions=500 
> then after some time it throws OutOfMemoryError.
> Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500 
> tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails 
> to create 500 tasks and create some internal calculated 17 tasks.
> Please refer to attachment - print screen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7337) FPGrowth algo throwing OutOfMemoryError

2015-05-04 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527221#comment-14527221
 ] 

Joseph K. Bradley commented on SPARK-7337:
--

Do you know how many distinct items are being collected to the driver?  If it's 
too many, then you'll have to increase minCount.  If that's the issue, then 
this is not a bug, but a limitation of the implementation.

> FPGrowth algo throwing OutOfMemoryError
> ---
>
> Key: SPARK-7337
> URL: https://issues.apache.org/jira/browse/SPARK-7337
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.3.1
> Environment: Ubuntu
>Reporter: Amit Gupta
> Attachments: FPGrowthBug.png
>
>
> When running FPGrowth algo with huge data in GBs and with numPartitions=500 
> then after some time it throws OutOfMemoryError.
> Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500 
> tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails 
> to create 500 tasks and create some internal calculated 17 tasks.
> Please refer to attachment - print screen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7337) FPGrowth algo throwing OutOfMemoryError

2015-05-04 Thread Amit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526536#comment-14526536
 ] 

Amit Gupta commented on SPARK-7337:
---

I am running it in "local" mode. It should spill over hard-disk. I can clearly 
see that 500 tasks are not created by next stage hence OutOfMemoryError is 
coming.

> FPGrowth algo throwing OutOfMemoryError
> ---
>
> Key: SPARK-7337
> URL: https://issues.apache.org/jira/browse/SPARK-7337
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.3.1
> Environment: Ubuntu
>Reporter: Amit Gupta
> Attachments: FPGrowthBug.png
>
>
> When running FPGrowth algo with huge data in GBs and with numPartitions=500 
> then after some time it throws OutOfMemoryError.
> Algo runs correctly upto "collect at FPGrowth.scala:131" where it creates 500 
> tasks. It fails at next stage "flatMap at FPGrowth.scala:150" where it fails 
> to create 500 tasks and create some internal calculated 17 tasks.
> Please refer to attachment - print screen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org