GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/21977
SPARK-25004: Add spark.executor.pyspark.memory limit. ## What changes were proposed in this pull request? This adds `spark.executor.pyspark.memory` to configure Python's address space limit, [`resource.RLIMIT_AS`](https://docs.python.org/3/library/resource.html#resource.RLIMIT_AS). Limiting Python's address space allows Python to participate in memory management. In practice, we see fewer cases of Python taking too much memory because it doesn't know to run garbage collection. This results in YARN killing fewer containers. This also improves error messages so users know that Python is consuming too much memory: ``` File "build/bdist.linux-x86_64/egg/package/library.py", line 265, in fe_engineer fe_eval_rec.update(f(src_rec_prep, mat_rec_prep)) File "build/bdist.linux-x86_64/egg/package/library.py", line 163, in fe_comp comparisons = EvaluationUtils.leven_list_compare(src_rec_prep.get(item, []), mat_rec_prep.get(item, [])) File "build/bdist.linux-x86_64/egg/package/evaluationutils.py", line 25, in leven_list_compare permutations = sorted(permutations, reverse=True) MemoryError ``` The new pyspark memory setting is used to increase requested YARN container memory, instead of sharing overhead memory between python and off-heap JVM activity. ## How was this patch tested? Tested memory limits in our YARN cluster and verified that MemoryError is thrown. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rdblue/spark SPARK-25004-add-python-memory-limit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21977.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21977 ---- commit 19cd9c5cce4420729074a0976b129889d70fd56c Author: Ryan Blue <blue@...> Date: 2018-05-09T18:34:50Z SPARK-25004: Add spark.executor.pyspark.memory limit. ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org