[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237725742 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- OK. let me remove that check here for now. I think we can discuss about the whole problem about how it's implemented later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237723350 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- I think it's way better to have the JVM not care and have the python side handle the setting according to its capabilities. It's a smaller change, it's more localized, and it reduces the amount of changes if this needs to be tweaked again in the future. I don't see the consistency argument you're making. You picked one example that is very different from this one, since it actually affects the behavior of the JVM code. Could it have been developed differently? Of course. But we're not discussing that feature here. And there are so few of these cases that you really can't draw a pattern from them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237716807 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- Just for consistency, and to avoid that value is set in JVM but works differently in Python side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237716592 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- > That is completely different. Because that affects how the Java side talks to the Python side. Not completely different tho. Ideally we can have one script that handles both cases. It deals with OS specific differences in Python sides. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237716326 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- I really don't understand what you said above. If the check is done on the Python side, what is the exact reason why you need it on the JVM side? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237716012 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- Yea, so the check and be done in Python side and JVM side. Yes, I am explicitly disabling it on Windows. I think the proper assumption is that we note it doesn't work if that's not tested. If someone finds it working, then that patch explicitly enables it with a proper documentation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237715420 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- What I meant by "What do you mean "python can't fail"? is quoted from `Python should not fail if resource cannot be imported` from @rdblue. I left the last comment before seeing your comment @vanzin. Let me leave an answer for that soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237714454 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- What is the behavior you want? What do you mean "python can't fail"? If the memory limit is set by the user, you have two options when the python side cannot do anything with it: - ignore it (the current patch) - raise an error Both can be done from the python side and do not require any checks in the JVM. Checking in the JVM is the wrong thing. You're hardcoding the logic that this feature can never work on Windows. And if someone finds out that it can, then you'll have to change it, and that check was basically useless, since an equivalent check exists on the python side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237715129 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- > Let me just follow majority A -1 is a -1. It's not majority based (this is not a release). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237715021 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- > For instance, forking in Python is disallowed in Python on Windows That is *completely* different. Because that affects how the Java side talks to the Python side. The feature in this PR has no bearing on what happens on the JVM side at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237714583 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- Can I add some more people for input? Let me just follow majority. cc @ueshin, @BryanCutler. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237714288 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- For instance, forking in Python is disallowed in Python so we do that control in JVM side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237713255 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- -1 for adding only Python check from my side. I don't understand why it's controversial as well. Python can't fail because `resource` only exists in unix based systems. We only have one exception case Windows which we know in JVM side. Configuration control should better be done in JVM side if possible and I don't think we should allow this feature work differently on Windows and this should be tested before we document we support it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237188935 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- I'm -1 too. Remove the check from the Java side. Let Python deal with whether it can enforce resource limits or not. The JVM should just tell Python what the limit should be. I don't understand why is that so controversial. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237169532 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- I'm -1 on this change. I think the correct behavior is that Python should not fail if resource cannot be imported, and the JVM should not do anything differently. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r237021410 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific module. +has_resource_module = True +try: +import resource +except ImportError: +has_resource_module = False --- End diff -- @vanzin, I'm going to remove this flag. Does it then look okay to you? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236926113 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I don't think I'm only the one tho. > Why is this code needed? I explain above multiple times. See above. > it's not doing anything useful if you keep the python check around See above. I want to delete them but added per review comment. > The JVM doesn't understand exactly what Python supports of not, it's better to let the python code decide that. Not really. resource module is a Python builtin module that exists unix based system. It just does not exist in Windows. > You say we should disable the feature on Windows. The python-side changes already do that. I explained above. See the first change I proposed 2d3315a. It relays on the environment variable. > We should not remove the extra memory requested from the resource manager just because you're running on Windows - you'll still need that memory, you'll just get a different error message if you end up using more than you requested. Yea, I know it would probably work. My question that is it ever tested? One failure case was found and it looks a bit odd that we document it works. It's not even tested and shall we make it simple rather then it make it work differently until it's tested? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236737983 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- It's a minor point, but you seem to be the one making a big fuss about it. Why is this code needed? It's not doing anything useful if you keep the python check around. The JVM doesn't understand exactly what Python supports of not, it's better to let the python code decide that. You say we should disable the feature on Windows. The python-side changes already do that. We should not remove the extra memory requested from the resource manager just because you're running on Windows - you'll still need that memory, you'll just get a different error message if you end up using more than you requested. > If we can find a workaround on Windows, we can fix the Python worker and enable it. Exactly, and that should not require making any changes on the JVM side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236692833 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- BTW, i don't get why we should argue like this for this one. Isn't it a minor point? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236512196 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I mean, even if it succeeds to allocate in Yarn and Python worker doesn't have the control on that, what's the point? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236494667 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I mean .. what I am disabling here is setting Python memory limit on Windows; looks Yarn side still can allocate more but not tested. I'm thinking we should disable whole feature on Windows ideally but I couldn't either test it on Windows because I don't have Windows Yarn cluster (I only have one Windows machine that has dev environment). I was trying to at least document that it doesn't work because it's going to work differently comparing to other OSes that don't have `resource` module (For current status, PySpark apps that uses Python worker do not work at all and we don't necessarily document it works on Windows). I think it's more correct to say it does not work because it's not tested (or at least just simply say it's not guaranteed on Windows). For the reason that I prefer to check it on JVM side instead is, 1. It's really usual to check it on JVM side when it's possible and make the worker simple. It could make it coupled to JVM but it's already strongly coupled 2. If Yarn code can also be tested, and if it doesn't work, it should also be disabled in JVM side. If we can find a workaround on Windows, we can fix the Python worker and enable it. 3. If it's checked JVM side ahead, it reduces one state in JVM (`memoryMb`) is enabled in JVM side but Python skips it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236488396 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- @vanzin, I mean the first change I did at https://github.com/apache/spark/pull/23055/commits/2d3315a7dab429abc4d9ef5ed7f8f5484e8421f1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236487153 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- > The point Ryan and I are making is that there's no point in having this check in the Java code, because the Python code already handles it. It doesn't work on Windows, and the updated Python code handles that. My point is we should remove the Python condition. Also, I mean I wonder if this whole feature itself introduced at https://github.com/apache/spark/pull/21977 works or not since it looks not ever tested. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236485186 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I don't understand what you're getting at here. The point Ryan and I are making is that there's no point in having this check in the Java code, because the Python code already handles it. It doesn't work on Windows, and the updated Python code handles that. The only shortcoming here is passing an env variable to Python that won't be used if you're on Windows. That's really super trivial and not a big deal at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236484520 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- To be honest, I'm skeptical if it really works or not on Windows. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236484322 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- Thanks, Vanzin. However, it really brings complexity on the other hand. Maintaining codes for Windows actually are quite costly. We can just simply make it atomic if it works or not by disabling it on Windows rather then making the feature complicated by introducing another state. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236483417 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- > Strictly we should move it into JVM rather then adding more controls at workers. There's a reason why the value is sent to the worker. It enables a feature that, when available, gives you better error information. With the python code, if the app runs over the specified limit, you'll get an error from python saying it's using more memory than it should. Without it, you'll get a generic error from the resource manager that your app exceeded its memory allocation, and you wont' know exactly what caused it (java? python? something else?). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236481476 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- > The allocation on the JVM side is still the correct behavior. Also, is it ever tested on Windows? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236480711 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- Documentation should be "For platforms where the `resource` API is available, python will limit its resource usage". The allocation on the JVM side is still the correct behavior. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236479633 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- The benefit is as I said less complexity and explicit condition/documentation. How are we going to document this then? It's automatically disabled when 'resource package is not found in specified Python executable at executor side? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236479186 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- If there are more conditions found to skip this, then we should do that later in JVM side if possible. If not, we can discuss there later. If it's absolutely required to skip in worker side, we can do it later and remove JVM side check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236478702 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- Strictly we should move it into JVM rather then adding more controls at workers. Otherwise we will end up sending a bunch of data and environment variables into python workers. It already send the environment variable that indicates that this configuration is set and I was thinking what we should do is to disable it on certain condition rather then checking the package and skip in Python worker side. I would rather remove python side's check then. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236361258 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- BTW I agree with Ryan that this seems unnecessary. Also because the checks on either side are not the same (checks for Windows on the Java side, checks for a specific module on the Python side). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236345625 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- > functionality is disabled in Python side The only functionality that is disabled is limiting the memory space. The allocation for Python is still requested from resource managers. Setting the environment property tells python how much memory it was allocated, no matter how that is used or enforced. > code consistency - usually the configuration is dealt with in JVM side if possible The JVM is handling the setting by requesting that memory for python and passing on the amount requested to python. The fact that the python process can't limit doesn't affect how the JVM side should behave. This needlessly couples JVM and python behavior with an assumption that may not be true in the future. > Why are you so against about disabling it in JVM side? There is no benefit to disabling this. It is more code with no purpose and it makes assumptions about what python can or cannot do that aren't obvious. What if pandas implements some method to spill to disk to limit memory consumption? Will implementers of that future feature know that the environment variable is not set when running in windows? This adds complexity for no benefit because it doesn't change either the resource allocation in the JVM or the behavior of the python process. It only avoids sending valuable information. I see no reason for it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r235524421 --- Diff: python/pyspark/worker.py --- @@ -22,7 +22,12 @@ import os import sys import time -import resource +# 'resource' is a Unix specific package. +has_resource_package = True --- End diff -- nit: is it technically a module, not a package? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r235523238 --- Diff: docs/configuration.md --- @@ -189,7 +189,7 @@ of the most common options to set are: limited to this amount. If not set, Spark will not limit Python's memory use and it is up to the application to avoid exceeding the overhead memory space shared with other non-JVM processes. When PySpark is run in YARN or Kubernetes, this memory -is added to executor resource requests. +is added to executor resource requests. This configuration is not supported on Windows. --- End diff -- Maybe add `NOTE: ...` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r235226292 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I am doing this because: 1. The code consistency - usually the configuration is dealt with in JVM side if possible - otherwise we should send the configuration to worker side and the process it in Python worker side ideally. What we need to do is disable the configuration on a certain condition explicitly. 2. If we do it only in Python side, it's going to introduce the third status in JVM (`memoryMb`). When the configuration value exists in JVM (which means it's enabled) but the functionaility is disabled in Python side. Dealing with it will reduce the complexity. Why are you so against about disabling it in JVM side? I don't see big downside of doing this. I added a flag as you said as well. For clarification, it's a rather minor point. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r235082191 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- There is no configuration to change needed on the JVM side. The JVM should communicate to Python how much memory it is allocated. If Python can limit itself to that amount, then that's fine. If the JVM doesn't expect Python to be able to limit, why would it not tell Python how much memory it was allocated? There is no benefit to making this change that I can see. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234838984 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- For the current status, there's no diff. I'm just trying to match to what we have been usually doing - configuration control is usually made in JVM side and I propose to disallow this configuration on Windows. I think it's pretty minor point now because I added the flag as well ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234691652 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- @HyukjinKwon, what should the JVM side do differently if `resource` is not available? I don't think it should do anything different. It should still allocate the python memory region when requesting resources from schedulers. The only difference is that python isn't self-limiting. Do you have an example of something that the JVM should change when running on Windows? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234398745 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- > My point is that if resource can't be loaded for any reason, the code shouldn't fail. As it is, if resource can't be loaded then that is handled, but if the memory limit is set then the worker will still try to use it. That's what I think is brittle. There should be a flag for whether to attempt to use the resource API, based on whether it was loaded. Ah, so the point is that the condition for the existence `resource` might not be clear - so we should have the flag to make it not failed in case. Yup, makes sense. Let me add a flag. > If the worker operates as I described, then why make any changes on the JVM side? I am making some changes on the JVM side so that we can explicitly disable on a certain condition For instance, if we don't make a change in JVM side, and only make the changes in Python `worker`. Later, we can have some other changes in JVM side referring this configuration - which can be problematic. If we keep the configuration somehow in JVM side, it basically means we could have another status for this configuration, rather then just being disabled or enabled which we should take care of. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234286173 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- My point is that if resource can't be loaded for any reason, the code shouldn't fail. As it is, if resource can't be loaded then that is handled, but if the memory limit is set then the worker will still try to use it. That's what I think is brittle. There should be a flag for whether to attempt to use the resource API, based on whether it was loaded. If the worker operates as I described, then why make any changes on the JVM side? Why avoid telling the worker how much memory it has? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234086569 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- > JVM could set the request This is handled in JVM so it wouldn't break. `worker` itself is strongly coupled to JVM. You mean that case when the client is in Windows machine and it uses a Unix-based clusters, right? I think this is what the fix already does - the `PythonRunner`s already are created at executor side and it wouldn't affect when the client is on Windows. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234084002 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I mean that it is brittle to try to use `resource` if the JVM has set the property. You handle the `ImportError`, but the JVM could set the request and Python would break again. I think that this should not be entirely disabled on Windows. Resource requests to YARN or other schedulers should include this memory. The only feature that should be disabled is the resource limiting on the python side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234081475 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I see. I think the point of view is a bit different. What I was trying to do is that: we declare this configuration is not supported on Windows, meaning we disable this configuration on Windows from the start, JVM side - because it's JVM to launch Python workers. So, I was trying to leave the control to JVM. > It seems brittle to disable this on the JVM side and rely on it here. Can we also set a flag in the ImportError case and also check that here? However, in a way, It's a bit odd to say it's brittle because we're already relying on that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234080578 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I don't think this is necessary. If `resource` can't be imported for any reason, then memory will not be limited in python. But the JVM side shouldn't be what determines whether that happens. The JVM should do everything the same way -- even requesting memory from schedulers like YARN because that space should still be allocated as python memory, even if python can't self-limit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234080290 --- Diff: python/pyspark/worker.py --- @@ -268,9 +272,11 @@ def main(infile, outfile): # set up memory limits memory_limit_mb = int(os.environ.get('PYSPARK_EXECUTOR_MEMORY_MB', "-1")) -total_memory = resource.RLIMIT_AS -try: -if memory_limit_mb > 0: +# 'PYSPARK_EXECUTOR_MEMORY_MB' should be undefined on Windows because it depends on +# resource package which is a Unix specific package. +if memory_limit_mb > 0: --- End diff -- It seems brittle to disable this on the JVM side and rely on it here. Can we also set a flag in the ImportError case and also check that here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org