HyukjinKwon commented on a change in pull request #21977: [SPARK-25004][CORE] 
Add spark.executor.pyspark.memory limit.
URL: https://github.com/apache/spark/pull/21977#discussion_r251220966
 
 

 ##########
 File path: docs/configuration.md
 ##########
 @@ -179,6 +179,18 @@ of the most common options to set are:
     (e.g. <code>2g</code>, <code>8g</code>).
   </td>
 </tr>
+<tr>
+ <td><code>spark.executor.pyspark.memory</code></td>
+  <td>Not set</td>
+  <td>
+    The amount of memory to be allocated to PySpark in each executor, in MiB
+    unless otherwise specified.  If set, PySpark memory for an executor will be
+    limited to this amount. If not set, Spark will not limit Python's memory 
use
 
 Review comment:
   @rdblue, which OS did you test?
   
   I doesn't work in my case in non-yarn (local mode) at my Mac and I suspect 
it's OS-specific.
   
   ```bash
   $ ./bin/pyspark --conf spark.executor.pyspark.memory=1m
   ```
   
   ```python
   def ff(iter):
       def get_used_memory():
           import psutil
           process = psutil.Process(os.getpid())
           info = process.memory_info()
           return info.rss
       import numpy
       a = numpy.arange(1024 * 1024 * 1024, dtype="u8")
       return [get_used_memory()]
   
   sc.parallelize([], 1).mapPartitions(ff).collect()
   ```
   
   ```python
   def ff(_):
       import sys, numpy
       a = numpy.arange(1024 * 1024 * 1024, dtype="u8")
       return [sys.getsizeof(a)]
   
   sc.parallelize([], 1).mapPartitions(ff).collect()
   ```
   
   Can you clarify how you tested in the PR description?
   
   FYI,
   
   My Mac:
   
   ```python
   >>> import resource
   >>> size = 50 * 1024 * 1024
   >>> resource.setrlimit(resource.RLIMIT_AS, (size, size))
   >>> a = 'a' * size
   ```
   
   at CentOS Linux release 7.5.1804 (Core):
   
   ```python
   >>> import resource
   >>> size = 50 * 1024 * 1024
   >>> resource.setrlimit(resource.RLIMIT_AS, (size, size))
   >>> a = 'a' * size
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   MemoryError
   ```
   
   Looks we should better note this for clarification.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to