Re: Market Basket Analysis by deploying FP Growth algorithm

2017-04-05 Thread Patrick Plaatje
Hi Arun, We have been running into the same issue (having only 1000 unique items, in 100MM transactions), but have not investigated the root cause of this. We decided to run this on a cluster instead (4*16 / 64GB Ram), after which the OOM issue went away. However, we ran into the issue that

Re: FP growth - Items in a transaction must be unique

2017-02-02 Thread Patrick Plaatje
Hi, This indicates you have duplicate products per row in your dataframe, the FP implementation only allows unique products per row, so you will need to dedupe duplicate products before running the FPGrowth algorithm. Best, Patrick From: "Devi P.V" Date:

Re: newbie unable to write to S3 403 forbidden error

2016-02-13 Thread Patrick Plaatje
Not sure if it’s related, but in our Hadoop configuration we’re also setting sc.hadoopConfiguration().set("fs.s3.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem”); Cheers, -patrick From: Andy Davidson Date: Friday, 12 February 2016 at 17:34 To: Igor

Getting top distinct strings from arraylist

2016-01-25 Thread Patrick Plaatje
Hi, I’m quite new to Spark and MR, but have a requirement to get all distinct values with their respective counts from a transactional file. Let’s assume the following file format: 0 1 2 3 4 5 6 7 1 3 4 5 8 9 9 10 11 12 13 14 15 16 17 18 1 4 7 11 12 13 19 20 3 4 7 11 15 20 21 22 23 1 2 5 9 11