Hello;

I’m working on spark with very large memory systems (2TB+) and notice that
Spark spills to disk in shuffle.  Is there a way to force spark to stay
exclusively in memory when doing shuffle operations?   The goal is to keep
the shuffle data either in the heap or in off-heap memory (in 1.6.x) and
never touch the IO subsystem.  I am willing to have the job fail if it runs
out of RAM.

spark.shuffle.spill true  is deprecated in 1.6 and does not work in Tungsten
sort in 1.5.x

"WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but this
is ignored by the tungsten-sort shuffle manager; its optimized shuffles will
continue to spill to disk when necessary.”

If this is impossible via configuration changes what code changes would be
needed to accomplish this?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/In-Memory-Only-Spark-Shuffle-tp26661.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to