Hello; I’m working on spark with very large memory systems (2TB+) and notice that Spark spills to disk in shuffle. Is there a way to force spark to stay exclusively in memory when doing shuffle operations? The goal is to keep the shuffle data either in the heap or in off-heap memory (in 1.6.x) and never touch the IO subsystem. I am willing to have the job fail if it runs out of RAM.
spark.shuffle.spill true is deprecated in 1.6 and does not work in Tungsten sort in 1.5.x "WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but this is ignored by the tungsten-sort shuffle manager; its optimized shuffles will continue to spill to disk when necessary.” If this is impossible via configuration changes what code changes would be needed to accomplish this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/In-Memory-Only-Spark-Shuffle-tp26661.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org