HyukjinKwon commented on a change in pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.memory limit. URL: https://github.com/apache/spark/pull/21977#discussion_r251220966
########## File path: docs/configuration.md ########## @@ -179,6 +179,18 @@ of the most common options to set are: (e.g. <code>2g</code>, <code>8g</code>). </td> </tr> +<tr> + <td><code>spark.executor.pyspark.memory</code></td> + <td>Not set</td> + <td> + The amount of memory to be allocated to PySpark in each executor, in MiB + unless otherwise specified. If set, PySpark memory for an executor will be + limited to this amount. If not set, Spark will not limit Python's memory use Review comment: @rdblue, which OS did you test? I doesn't work in my case in non-yarn (local mode) at my Mac and I suspect it's OS-specific. ```bash $ ./bin/pyspark --conf spark.executor.pyspark.memory=1m ``` ```python def ff(iter): def get_used_memory(): import psutil process = psutil.Process(os.getpid()) info = process.memory_info() return info.rss import numpy a = numpy.arange(1024 * 1024 * 1024, dtype="u8") return [get_used_memory()] sc.parallelize([], 1).mapPartitions(ff).collect() ``` ```python def ff(_): import sys, numpy a = numpy.arange(1024 * 1024 * 1024, dtype="u8") return [sys.getsizeof(a)] sc.parallelize([], 1).mapPartitions(ff).collect() ``` Can you clarify how you tested in the PR description? FYI, My Mac: ```python >>> import resource >>> size = 50 * 1024 * 1024 >>> resource.setrlimit(resource.RLIMIT_AS, (size, size)) >>> a = 'a' * size ``` at CentOS Linux release 7.5.1804 (Core): ```python >>> import resource >>> size = 50 * 1024 * 1024 >>> resource.setrlimit(resource.RLIMIT_AS, (size, size)) >>> a = 'a' * size Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError ``` Looks we should better note this for clarification. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org