GitHub user rdblue opened a pull request:

    https://github.com/apache/spark/pull/21977

    SPARK-25004: Add spark.executor.pyspark.memory limit.

    ## What changes were proposed in this pull request?
    
    This adds `spark.executor.pyspark.memory` to configure Python's address 
space limit, 
[`resource.RLIMIT_AS`](https://docs.python.org/3/library/resource.html#resource.RLIMIT_AS).
 Limiting Python's address space allows Python to participate in memory 
management. In practice, we see fewer cases of Python taking too much memory 
because it doesn't know to run garbage collection. This results in YARN killing 
fewer containers. This also improves error messages so users know that Python 
is consuming too much memory:
    
    ```
      File "build/bdist.linux-x86_64/egg/package/library.py", line 265, in 
fe_engineer
        fe_eval_rec.update(f(src_rec_prep, mat_rec_prep))
      File "build/bdist.linux-x86_64/egg/package/library.py", line 163, in 
fe_comp
        comparisons = EvaluationUtils.leven_list_compare(src_rec_prep.get(item, 
[]), mat_rec_prep.get(item, []))
      File "build/bdist.linux-x86_64/egg/package/evaluationutils.py", line 25, 
in leven_list_compare
        permutations = sorted(permutations, reverse=True)
      MemoryError
    ```
    
    The new pyspark memory setting is used to increase requested YARN container 
memory, instead of sharing overhead memory between python and off-heap JVM 
activity.
    
    ## How was this patch tested?
    
    Tested memory limits in our YARN cluster and verified that MemoryError is 
thrown.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rdblue/spark 
SPARK-25004-add-python-memory-limit

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21977.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21977
    
----
commit 19cd9c5cce4420729074a0976b129889d70fd56c
Author: Ryan Blue <blue@...>
Date:   2018-05-09T18:34:50Z

    SPARK-25004: Add spark.executor.pyspark.memory limit.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to