Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207601410 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -51,6 +52,17 @@ private[spark] class PythonRDD( val bufferSize = conf.getInt("spark.buffer.size", 65536) val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) + val memoryMb = { --- End diff -- It's been awhile since I spent a lot of time thinking about how we launch our python worker processes. Maybe it would make sense to add a comment here explaining the logic a bit more? Based on the documentation in `PythonWorkerFactory` it appears we do the fork/not-fork decision not based on if reuseworker is set but instead on if we're in Windows or not. Is that the logic that this block was attempting to handle?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org