HyukjinKwon commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-513675357 > 1) Do we think there might be any problems with blocking around how we do streaming in Python? I am not sure about blocking stuff because it will always create new thread. The number of threads might matter if too many jobs are executed in the driver side since we're not reusing the thread on JVM in this new mode. But I think the same concern goes to Scala API itself anyway. > 2) When you say "experimental" do you mean experimental in PySpark or experimental in Py4J as well? I meant experimental in PySpark .. it's pretty much a core change. > 3) Is this only to support the Job Group or is there other benefits that you think we would get from this? (Because if it's just fixing the job group ID we might be able to find a simpler way to track that information in the Python thread and pass it through each time)? Seems like there's no simpler way to fix it. Virtually one explicit problem found is, yes, the local properties. However, reusing threads (existing way we do) can cause multiple potential problems wherever we use thread local or inheritable thread local in JVM at the driver side.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org