On Mon, Apr 27, 2015 at 7:36 AM, Shuai Zheng szheng.c...@gmail.com wrote:
Thanks. So may I know what is your configuration for more/smaller
executors on r3.8xlarge, how big of the memory that you eventually decide
to give one executor without impact performance (for example: 64g? ).
We're
PM
To: Dean Wampler
Cc: Shuai Zheng; user@spark.apache.org
Subject: Re: Slower performance when bigger memory?
FWIW, I ran into a similar issue on r3.8xlarge nodes and opted for more/smaller
executors. Another observation was that one large executor results in less
overall read throughput from
You can resort to Serialized storage (still in memory) of your RDDs - this
will obviate the need for GC since the RDD elements are stored as serialized
objects off the JVM heap (most likely in Tachion which is distributed in
memory files system used by Spark internally)
Also review the Object
this is not about gc issue itself. The memory is
On Friday, April 24, 2015, Evo Eftimov evo.efti...@isecc.com wrote:
You can resort to Serialized storage (still in memory) of your RDDs – this
will obviate the need for GC since the RDD elements are stored as
serialized objects off the JVM heap
FWIW, I ran into a similar issue on r3.8xlarge nodes and opted for
more/smaller executors. Another observation was that one large executor
results in less overall read throughput from S3 (using Amazon's EMRFS
implementation) in case that matters to your application.
-Sven
On Thu, Apr 23, 2015 at
Shuai:
Please take a look at:
http://blog.takipi.com/garbage-collectors-serial-vs-parallel-vs-cms-vs-the-g1-and-whats-new-in-java-8/
On Apr 23, 2015, at 10:18 AM, Dean Wampler deanwamp...@gmail.com wrote:
JVM's often have significant GC overhead with heaps bigger than 64GB. You
might try