Have you seen this thread ? http://search-hadoop.com/m/q3RTtRbEiIXuOOS&subj=Re+PySpark+issue+with+sortByKey+IndexError+list+index+out+of+range+
which led to SPARK-4384 On Mon, May 16, 2016 at 8:09 PM, kramer2...@126.com <kramer2...@126.com> wrote: > I know the cache operation can cache data in memoyr/disk... > > But I am expecting to know will other operation will do the same? > > For example, I created a dataframe called df. The df is big so when I run > some action like : > > df.sort(column_name).show() > df.collect() > > It will throw error like : > 16/05/17 10:53:36 ERROR Executor: Managed memory leak detected; > size = > 2359296 bytes, TID = 15 > 16/05/17 10:53:36 ERROR Executor: Exception in task 0.0 in stage > 12.0 (TID > 15) > org.apache.spark.api.python.PythonException: Traceback (most > recent call > last): > File > "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", > line 111, in main > process() > File > "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", > line 106, in process > serializer.dump_stream(func(split_index, iterator), > outfile) > File > > "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", > line 263, in dump_stream > vs = list(itertools.islice(iterator, batch)) > File "<stdin>", line 1, in <lambda> > IndexError: list index out of range > > > I want to know is there any way or configuration to let spark swap memory > into disk for this situation? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Will-spark-swap-memory-out-to-disk-if-the-memory-is-not-enough-tp26968.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >