!
Mingyu
From: Arun Ahuja aahuj...@gmail.com
Date: Monday, October 6, 2014 at 7:50 AM
To: Andrew Ash and...@andrewash.com
Cc: Mingyu Kim m...@palantir.com, user@spark.apache.org
user@spark.apache.org, Dennis Lawler dlaw...@palantir.com
Subject: Re: Larger heap leads to perf degradation due to GC
We have used the strategy that you suggested, Andrew - using many workers
per machine and keeping the heaps small ( 20gb).
Using a large heap resulted in workers hanging or not responding (leading
to timeouts). The same dataset/job for us will fail (most often due to
akka disassociated or fetch
Hi Mingyu,
Maybe we should be limiting our heaps to 32GB max and running multiple
workers per machine to avoid large GC issues.
For a 128GB memory, 32 core machine, this could look like:
SPARK_WORKER_INSTANCES=4
SPARK_WORKER_MEMORY=32
SPARK_WORKER_CORES=8
Are people running with large (32GB+)