I know the cache operation can cache data in memoyr/disk... 

But I am expecting to know will other operation will do the same?

For example, I created a dataframe called df. The df is big so when I run
some action like :

df.sort(column_name).show()
df.collect()

It will throw error like :
        16/05/17 10:53:36 ERROR Executor: Managed memory leak detected; size =
2359296 bytes, TID = 15
        16/05/17 10:53:36 ERROR Executor: Exception in task 0.0 in stage 12.0 
(TID
15)
        org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
          File
"/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py",
line 111, in main
                process()
          File
"/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py",
line 106, in process
                serializer.dump_stream(func(split_index, iterator), outfile)
          File
"/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py",
line 263, in dump_stream
                vs = list(itertools.islice(iterator, batch))
          File "<stdin>", line 1, in <lambda>
        IndexError: list index out of range


I want to know is there any way or configuration to let spark swap memory
into disk for this situation?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Will-spark-swap-memory-out-to-disk-if-the-memory-is-not-enough-tp26968.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to