Hi Aniruddh,

Increasing number of partitions doesn't always help in ALS due to
communication/computation trade-off. What rank did you set? If the
rank is not large, I'd recommend a small number of partitions. There
are some other numbers to watch. Do you have super popular items/users
in your data?

Best,
Xiangrui

On Wed, Jul 8, 2015 at 5:32 AM, Evo Eftimov <evo.efti...@isecc.com> wrote:
> Also try to increase the number of partions gradually – not in one big jump
> from 20 to 100 but adding e.g. 10 at a time and see whether there is a
> correlation with adding more RAM to the executors
>
>
>
> From: Evo Eftimov [mailto:evo.efti...@isecc.com]
> Sent: Wednesday, July 8, 2015 1:26 PM
> To: 'Aniruddh Sharma'; 'user@spark.apache.org'
> Subject: RE: Out of Memory Errors on less number of cores in proportion to
> Partitions in Data
>
>
>
> Are you sure you have actually increased the RAM (how exactly did you do
> that and does it show in Spark UI)
>
>
>
> Also use the SPARK UI and the driver console  to check the RAM allocated for
> each RDD and RDD partion in each of the scenarios
>
>
>
> Re b) the general rule is num of partitions = 2 x num of CPU cores
>
>
>
> All partitions are operated in parallel (by independently running JVM
> Threads), however if you have substantially higher num of partitions (JVM
> Threads) than num of core then you will get what happens in any JVM or OS –
> there will be switching between the Threads and some of them will be in a
> suspended mode waiting for free core (Thread contexts also occupy additional
> RAM )
>
>
>
> From: Aniruddh Sharma [mailto:asharma...@gmail.com]
> Sent: Wednesday, July 8, 2015 12:52 PM
> To: Evo Eftimov
> Subject: Re: Out of Memory Errors on less number of cores in proportion to
> Partitions in Data
>
>
>
> Thanks for your revert...
>
> I increased executor memory from 4GB to 35 GB and still out of memory error
> happens. So it seems it may not be entirely due to more buffers due to more
> partitions.
>
> Query a) Is there a way to debug at more granular level from user code
> perspective where things could go wrong.
>
>
>
> Query b)
>
> In general my query is lets suppose it is not ALS (or some iterative
> algorithm). Lets say it is some sample RDD but which 10000 partitions and
> each executor has 50 partitions and each machine has 4 physical cores.So do
> 4 physical cores parallely try to process these 50 partitions (doing
> multitasking) or will it work in a way that 4 cores will first process first
> 4 partitions and then next 4 partitions and so on.
>
> Thanks and Regards
>
> Aniruddh
>
>
>
> On Wed, Jul 8, 2015 at 5:09 PM, Evo Eftimov <evo.efti...@isecc.com> wrote:
>
> This is most likely due to the internal implementation of ALS in MLib.
> Probably for each parallel unit of execution (partition in Spark terms) the
> implementation allocates and uses a RAM buffer where it keeps interim
> results during the ALS iterations
>
>
>
> If we assume that the size of that internal RAM buffer is fixed per Unit of
> Execution then Total RAM (20 partitions x fixed RAM buffer) < Total RAM (100
> partitions x fixed RAM buffer)
>
>
>
> From: Aniruddh Sharma [mailto:asharma...@gmail.com]
> Sent: Wednesday, July 8, 2015 12:22 PM
> To: user@spark.apache.org
> Subject: Out of Memory Errors on less number of cores in proportion to
> Partitions in Data
>
>
>
> Hi,
>
> I am new to Spark. I have done following tests and I am confused in
> conclusions. I have 2 queries.
>
> Following is the detail of test
>
> Test 1) Used 11 Node Cluster where each machine has 64 GB RAM and 4 physical
> cores. I ran a ALS algorithm using MilLib on 1.6 GB data set. I ran 10
> executors and my Rating data set has 20 partitions. It works. In order to
> increase parallelism, I did 100 partitions instead of 20 and now program
> does not work and it throws out of memory error.
>
>
>
> Query a): As I had 4 cores on each machine , but my number of partitions are
> 10 in each executor and my cores are not sufficient for partitions. Is it
> supposed to give memory errors when this kind of misconfiguration.If there
> are not sufficient cores and processing cannot be done in parallel, can
> different partitions not be processed sequentially and operation could have
> become slow rather than throwing memory error.
>
> Query b)  If it gives error, then error message is not meaningful Here my
> DAG was very simple and I could trace that lowering number of partitions is
> working, but if on misconfiguration of cores it throws error, then how to
> debug it in complex DAGs as error does not tell explicitly that problem
> could be due to low number of cores. If my understanding is incorrect, then
> kindly explain the reasons of error in this case
>
>
>
> Thanks and Regards
>
> Aniruddh
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to