I have been giving it 8-12G -Raj Sent from my iPhone
> On Jan 12, 2016, at 6:50 PM, Sabarish Sasidharan > <sabarish.sasidha...@manthan.com> wrote: > > How much RAM are you giving to the driver? 17000 items being collected > shouldn't fail unless your driver memory is too low. > > Regards > Sab > >> On 13-Jan-2016 6:14 am, "Ritu Raj Tiwari" <rituraj_tiw...@yahoo.com.invalid> >> wrote: >> Folks: >> We are running into a problem where FPGrowth seems to choke on data sets >> that we think are not too large. We have about 200,000 transactions. Each >> transaction is composed of on an average 50 items. There are about 17,000 >> unique item (SKUs) that might show up in any transaction. >> >> When running locally with 12G ram given to the PySpark process, the FPGrowth >> code fails with out of memory error for minSupport of 0.001. The failure >> occurs when we try to enumerate and save the frequent itemsets. Looking at >> the FPGrowth code >> (https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala), >> it seems this is because the genFreqItems() method tries to collect() all >> items. Is there a way the code could be rewritten so it does not try to >> collect and therefore store all frequent item sets in memory? >> >> Thanks for any insights. >> >> -Raj