Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23055#discussion_r235226292
  
    --- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
    @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
       private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", 
true)
       // each python worker gets an equal part of the allocation. the worker 
pool will grow to the
       // number of concurrent tasks, which is determined by the number of 
cores in this executor.
    -  private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY)
    +  private val memoryMb = if (Utils.isWindows) {
    --- End diff --
    
    I am doing this because:
    
    1. The code consistency - usually the configuration is dealt with in JVM 
side if possible - otherwise we should send the configuration to worker side 
and the process it in Python worker side ideally. What we need to do is disable 
the configuration on a certain condition explicitly.
    
    2. If we do it only in Python side, it's going to introduce the third 
status in JVM (`memoryMb`). When the configuration value exists in JVM (which 
means it's enabled) but the functionaility is disabled in Python side. Dealing 
with it will reduce the complexity.
    
    Why are you so against about disabling it in JVM side? I don't see big 
downside of doing this. I added a flag as you said as well. For clarification, 
it's a rather minor point.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to