Re: FPGrowth does not handle large result sets

2016-01-13 Thread Sean Owen
As I said in your JIRA, the collect() in question is bringing results back to the driver to return them. The assumption is that there aren't a vast number of frequent items. If they are, they aren't 'frequent' and your min support is too low. On Wed, Jan 13, 2016 at 12:43 AM, Ritu Raj Tiwari

Re: FPGrowth does not handle large result sets

2016-01-13 Thread Ritu Raj Tiwari
Thanks Sean! I'll start with higher support threshold and work my way down. On Wednesday, January 13, 2016 8:57 AM, Sean Owen wrote: You're looking for subsets of items that appear in at least 200 of 200,000 transactions, which could be a whole lot. Keep in mind

Re: FPGrowth does not handle large result sets

2016-01-13 Thread Ritu Raj Tiwari
Hi Sean:Thanks for checking out my question here. Its possible I am making a newbie error. Based on my dataset of about 200,000 transactions and a minimum support level of 0.001, I am looking for items that appear at least 200 times. Given that the items in my transactions are drawn from a set

Re: FPGrowth does not handle large result sets

2016-01-13 Thread Sean Owen
You're looking for subsets of items that appear in at least 200 of 200,000 transactions, which could be a whole lot. Keep in mind there are 25,000 items, sure, but already 625,000,000 possible pairs of items, and trillions of possible 3-item subsets. This sounds like it's just far too low. Start

Re: FPGrowth does not handle large result sets

2016-01-12 Thread Ritu Raj Tiwari
I have been giving it 8-12G -Raj Sent from my iPhone > On Jan 12, 2016, at 6:50 PM, Sabarish Sasidharan > wrote: > > How much RAM are you giving to the driver? 17000 items being collected > shouldn't fail unless your driver memory is too low. > > Regards >

FPGrowth does not handle large result sets

2016-01-12 Thread Ritu Raj Tiwari
Folks:We are running into a problem where FPGrowth seems to choke on data sets that we think are not too large. We have about 200,000 transactions. Each transaction is composed of on an average 50 items. There are about 17,000 unique item (SKUs) that might show up in any transaction. When