s issue.
>
>
> - Original Message -
> From: silvio.fior...@granturing.com
> To: christopher.br...@oracle.com, ko...@tresata.com
> Cc: user@spark.apache.org
> Sent: Sunday, February 14, 2016 8:27:09 AM GMT -05:00 US/Canada Eastern
> Subject: RE: coalesce and executor m
Eastern
Subject: RE: coalesce and executor memory
Actually, rereading your email I see you're caching. But ‘cache’ uses
MEMORY_ONLY. Do you see errors about losing partitions as your job is running?
Are you sure you need to cache if you're just saving to disk? Can you try the
coalesce
<mailto:christopher.br...@oracle.com>
Sent: Friday, February 12, 2016 8:34 PM
To: Koert Kuipers<mailto:ko...@tresata.com>; Silvio
Fiorito<mailto:silvio.fior...@granturing.com>
Cc: user<mailto:user@spark.apache.org>
Subject: Re: coalesce and executor memory
Thank you for the responses. The m
On Fri, Feb 12, 2016 at 11:10 PM, Koert Kuipers wrote:
> in spark, every partition needs to fit in the memory available to the core
> processing it.
>
That does not agree with my understanding of how it works. I think you
could do
On Fri, Feb 12, 2016 at 11:10 PM, Koert Kuipers wrote:
> in spark, every partition needs to fit in the memory available to the core
> processing it.
>
That does not agree with my understanding of how it works. I think you
could do
sorry i meant to say:
and my way to deal with OOMs is almost always simply to increase number of
partitions. maybe there is a better way that i am not aware of.
On Sat, Feb 13, 2016 at 11:38 PM, Koert Kuipers wrote:
> thats right, its the reduce operation that makes the
thats right, its the reduce operation that makes the in-memory assumption,
not the map (although i am still suspicious that the map actually streams
from disk to disk record by record).
in reality though my experience is that is spark can not fit partitions in
memory it doesnt work well. i get
Can anyone help me understand why using coalesce causes my executors to
crash with out of memory? What happens during coalesce that increases
memory usage so much?
If I do:
hadoopFile -> sample -> cache -> map -> saveAsNewAPIHadoopFile
everything works fine, but if I do:
hadoopFile -> sample
Coalesce essentially reduces parallelism, so fewer cores are getting more
records. Be aware that it could also lead to loss of data locality, depending
on how far you reduce. Depending on what you’re doing in the map operation, it
could lead to OOM errors. Can you give more details as to what
in spark, every partition needs to fit in the memory available to the core
processing it.
as you coalesce you reduce number of partitions, increasing partition size.
at some point the partition no longer fits in memory.
On Fri, Feb 12, 2016 at 4:50 PM, Silvio Fiorito <
Thank you for the responses. The map function just changes the format of
the record slightly, so I don't think that would be the cause of the
memory problem.
So if I have 3 cores per executor, I need to be able to fit 3 partitions
per executor within whatever I specify for the executor
11 matches
Mail list logo