Hi Arun,
We have been running into the same issue (having only 1000 unique items, in
100MM transactions), but have not investigated the root cause of this. We
decided to run this on a cluster instead (4*16 / 64GB Ram), after which the OOM
issue went away. However, we ran into the issue that
Hi,
This indicates you have duplicate products per row in your dataframe, the FP
implementation only allows unique products per row, so you will need to dedupe
duplicate products before running the FPGrowth algorithm.
Best,
Patrick
From: "Devi P.V"
Date:
Not sure if it’s related, but in our Hadoop configuration we’re also setting
sc.hadoopConfiguration().set("fs.s3.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem”);
Cheers,
-patrick
From: Andy Davidson
Date: Friday, 12 February 2016 at 17:34
To: Igor
Hi,
I’m quite new to Spark and MR, but have a requirement to get all distinct
values with their respective counts from a transactional file. Let’s assume the
following file format:
0 1 2 3 4 5 6 7
1 3 4 5 8 9
9 10 11 12 13 14 15 16 17 18
1 4 7 11 12 13 19 20
3 4 7 11 15 20 21 22 23
1 2 5 9 11