I know the cache operation can cache data in memoyr/disk... But I am expecting to know will other operation will do the same?
For example, I created a dataframe called df. The df is big so when I run some action like : df.sort(column_name).show() df.collect() It will throw error like : 16/05/17 10:53:36 ERROR Executor: Managed memory leak detected; size = 2359296 bytes, TID = 15 16/05/17 10:53:36 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 15) org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "<stdin>", line 1, in <lambda> IndexError: list index out of range I want to know is there any way or configuration to let spark swap memory into disk for this situation? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Will-spark-swap-memory-out-to-disk-if-the-memory-is-not-enough-tp26968.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org