Hi, The other option to consider is using G1 GC, which should behave better with large heaps. But pointers are not compressed in heaps > 32 GB in size, so you may be better off staying under 32 GB.
Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Mon, Oct 6, 2014 at 8:08 PM, Mingyu Kim <m...@palantir.com> wrote: > Ok, cool. This seems to be general issues in JVM with very large heaps. I > agree that the best workaround would be to keep the heap size below 32GB. > Thanks guys! > > Mingyu > > From: Arun Ahuja <aahuj...@gmail.com> > Date: Monday, October 6, 2014 at 7:50 AM > To: Andrew Ash <and...@andrewash.com> > Cc: Mingyu Kim <m...@palantir.com>, "user@spark.apache.org" < > user@spark.apache.org>, Dennis Lawler <dlaw...@palantir.com> > Subject: Re: Larger heap leads to perf degradation due to GC > > We have used the strategy that you suggested, Andrew - using many workers > per machine and keeping the heaps small (< 20gb). > > Using a large heap resulted in workers hanging or not responding (leading > to timeouts). The same dataset/job for us will fail (most often due to > akka disassociated or fetch failures errors) with 10 cores / 100 executors, > 60 gb per executor while succceed with 1 core / 1000 executors / 6gb per > executor. > > When the job does succceed with more cores per executor and larger heap it > is usually much slower than the smaller executors (the same 8-10 min job > taking 15-20 min to complete) > > The unfortunate downside of this has been, we have had some large > broadcast variables which may not fit into memory (and unnecessarily > duplicated) when using the smaller executors. > > Most of this is anecdotal but for the most part we have had more success > and consistency with more executors with smaller memory requirements. > > On Sun, Oct 5, 2014 at 7:20 PM, Andrew Ash <and...@andrewash.com> wrote: > >> Hi Mingyu, >> >> Maybe we should be limiting our heaps to 32GB max and running multiple >> workers per machine to avoid large GC issues. >> >> For a 128GB memory, 32 core machine, this could look like: >> >> SPARK_WORKER_INSTANCES=4 >> SPARK_WORKER_MEMORY=32 >> SPARK_WORKER_CORES=8 >> >> Are people running with large (32GB+) executor heaps in production? I'd >> be curious to hear if so. >> >> Cheers! >> Andrew >> >> On Thu, Oct 2, 2014 at 1:30 PM, Mingyu Kim <m...@palantir.com> wrote: >> >>> This issue definitely needs more investigation, but I just wanted to >>> quickly check if anyone has run into this problem or has general guidance >>> around it. We’ve seen a performance degradation with a large heap on a >>> simple map task (I.e. No shuffle). We’ve seen the slowness starting around >>> from 50GB heap. (I.e. spark.executor.memoty=50g) And, when we checked the >>> CPU usage, there were just a lot of GCs going on. >>> >>> Has anyone seen a similar problem? >>> >>> Thanks, >>> Mingyu >>> >> >> >