Hi Aniruddh,
Increasing number of partitions doesn't always help in ALS due to
communication/computation trade-off. What rank did you set? If the
rank is not large, I'd recommend a small number of partitions. There
are some other numbers to watch. Do you have super popular items/users
in your
This is most likely due to the internal implementation of ALS in MLib. Probably
for each parallel unit of execution (partition in Spark terms) the
implementation allocates and uses a RAM buffer where it keeps interim results
during the ALS iterations
If we assume that the size of that
Hi,
I am new to Spark. I have done following tests and I am confused in
conclusions. I have 2 queries.
Following is the detail of test
Test 1) Used 11 Node Cluster where each machine has 64 GB RAM and 4
physical cores. I ran a ALS algorithm using MilLib on 1.6 GB data set. I
ran 10 executors
Are you sure you have actually increased the RAM (how exactly did you do that
and does it show in Spark UI)
Also use the SPARK UI and the driver console to check the RAM allocated for
each RDD and RDD partion in each of the scenarios
Re b) the general rule is num of partitions = 2 x
Also try to increase the number of partions gradually – not in one big jump
from 20 to 100 but adding e.g. 10 at a time and see whether there is a
correlation with adding more RAM to the executors
From: Evo Eftimov [mailto:evo.efti...@isecc.com]
Sent: Wednesday, July 8, 2015 1:26 PM
To: