Re: coalesce and executor memory

2016-02-14 Thread Sabarish Sasidharan
s issue. > > > - Original Message - > From: silvio.fior...@granturing.com > To: christopher.br...@oracle.com, ko...@tresata.com > Cc: user@spark.apache.org > Sent: Sunday, February 14, 2016 8:27:09 AM GMT -05:00 US/Canada Eastern > Subject: RE: coalesce and executor m

Re: coalesce and executor memory

2016-02-14 Thread Christopher Brady
Eastern Subject: RE: coalesce and executor memory Actually, rereading your email I see you're caching. But ‘cache’ uses MEMORY_ONLY. Do you see errors about losing partitions as your job is running? Are you sure you need to cache if you're just saving to disk? Can you try the coalesce

RE: coalesce and executor memory

2016-02-14 Thread Silvio Fiorito
<mailto:christopher.br...@oracle.com> Sent: Friday, February 12, 2016 8:34 PM To: Koert Kuipers<mailto:ko...@tresata.com>; Silvio Fiorito<mailto:silvio.fior...@granturing.com> Cc: user<mailto:user@spark.apache.org> Subject: Re: coalesce and executor memory Thank you for the responses. The m

Re: coalesce and executor memory

2016-02-13 Thread Daniel Darabos
On Fri, Feb 12, 2016 at 11:10 PM, Koert Kuipers wrote: > in spark, every partition needs to fit in the memory available to the core > processing it. > That does not agree with my understanding of how it works. I think you could do

Re: coalesce and executor memory

2016-02-13 Thread Daniel Darabos
On Fri, Feb 12, 2016 at 11:10 PM, Koert Kuipers wrote: > in spark, every partition needs to fit in the memory available to the core > processing it. > That does not agree with my understanding of how it works. I think you could do

Re: coalesce and executor memory

2016-02-13 Thread Koert Kuipers
sorry i meant to say: and my way to deal with OOMs is almost always simply to increase number of partitions. maybe there is a better way that i am not aware of. On Sat, Feb 13, 2016 at 11:38 PM, Koert Kuipers wrote: > thats right, its the reduce operation that makes the

Re: coalesce and executor memory

2016-02-13 Thread Koert Kuipers
thats right, its the reduce operation that makes the in-memory assumption, not the map (although i am still suspicious that the map actually streams from disk to disk record by record). in reality though my experience is that is spark can not fit partitions in memory it doesnt work well. i get

Re: coalesce and executor memory

2016-02-12 Thread Silvio Fiorito
Coalesce essentially reduces parallelism, so fewer cores are getting more records. Be aware that it could also lead to loss of data locality, depending on how far you reduce. Depending on what you’re doing in the map operation, it could lead to OOM errors. Can you give more details as to what

Re: coalesce and executor memory

2016-02-12 Thread Koert Kuipers
in spark, every partition needs to fit in the memory available to the core processing it. as you coalesce you reduce number of partitions, increasing partition size. at some point the partition no longer fits in memory. On Fri, Feb 12, 2016 at 4:50 PM, Silvio Fiorito <

Re: coalesce and executor memory

2016-02-12 Thread Christopher Brady
Thank you for the responses. The map function just changes the format of the record slightly, so I don't think that would be the cause of the memory problem. So if I have 3 cores per executor, I need to be able to fit 3 partitions per executor within whatever I specify for the executor