memory vs data_size

2014-09-30 Thread anny9699
Hi, Is there a guidance about for a data of certain data size, how much total memory should be needed to achieve a relatively good speed? I have a data of around 200 GB and the current total memory for my 8 machines are around 120 GB. Is that too small to run the data of this big? Even the read

Re: memory vs data_size

2014-09-30 Thread Liquan Pei
Hi, By default, 60% of JVM memory is reserved for RDD caching, so in your case, 72GB memory is available for RDDs which means that your total data may fit in memory. You can check the RDD memory statistics via the storage tab in web ui. Hope this helps! Liquan On Tue, Sep 30, 2014 at 4:11 PM,

Re: memory vs data_size

2014-09-30 Thread Debasish Das
Only fit the data in memory where you want to run the iterative algorithm For map-reduce operations, it's better not to cache if you have a memory crunch... Also schedule the persist and unpersist such that you utilize the RAM well... On Tue, Sep 30, 2014 at 4:34 PM, Liquan Pei