There are a bunch of tricks noted in the Tuning Guide<http://spark.incubator.apache.org/docs/latest/tuning.html#memory-tuning>. You may have seen them already but I thought its still worth mentioning for the records.
Besides those, if you are concerned about consistent latency (that is, low variability in the job processing times), then using concurrent-mark-and-sweep GC is recommended. Instead of big stop-the-world GC pauses, there are many smaller pauses. This reduction in variability comes at the cost of processing throughput though, so thats a tradeoff. TD On Thu, Jan 16, 2014 at 11:35 AM, Kay Ousterhout <[email protected]>wrote: > Hi all, > > I'm finding that Java GC can be a major performance bottleneck when running > Spark at high (>50% or so) memory utilization. What GC tuning have people > tried for Spark and how effective has it been? > > Thanks! > > Kay >
