Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23055#discussion_r236494667
  
    --- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
    @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
       private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", 
true)
       // each python worker gets an equal part of the allocation. the worker 
pool will grow to the
       // number of concurrent tasks, which is determined by the number of 
cores in this executor.
    -  private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY)
    +  private val memoryMb = if (Utils.isWindows) {
    --- End diff --
    
    I mean .. what I am disabling here is setting Python memory limit on 
Windows; looks Yarn side still can allocate more but not tested.
    
    I'm thinking we should disable whole feature on Windows ideally but I 
couldn't either test it on Windows because I don't have Windows Yarn cluster (I 
only have one Windows machine that has dev environment).
    
    I was trying to at least document that it doesn't work because it's going 
to work differently comparing to other OSes that don't have `resource` module 
(For current status, PySpark apps that uses Python worker do not work at all 
and we don't necessarily document it works on Windows). I think it's more 
correct to say it does not work because it's not tested (or at least just 
simply say it's not guaranteed on Windows).
    
    For the reason that I prefer to check it on JVM side instead is, 
    1. It's really usual to check it on JVM side when it's possible and make 
the worker simple. It could make it coupled to JVM but it's already strongly 
coupled
    
    2. If Yarn code can also be tested, and if it doesn't work, it should also 
be disabled in JVM side.
    If we can find a workaround on Windows, we can fix the Python worker and 
enable it.
    
    3. If it's checked JVM side ahead, it reduces one state in JVM (`memoryMb`) 
is enabled in JVM side but Python skips it.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to