HyukjinKwon commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin 
Python thread into JVM's
URL: https://github.com/apache/spark/pull/24898#issuecomment-513675357
 
 
   > 1) Do we think there might be any problems with blocking around how we do 
streaming in Python?
   
   I am not sure about blocking stuff because it will always create new thread. 
The number of threads might matter if too many jobs are executed in the driver 
side since we're not reusing the thread on JVM in this new mode. But I think 
the same concern goes to Scala API itself anyway.
   
   > 2) When you say "experimental" do you mean experimental in PySpark or 
experimental in Py4J as well?
   
   I meant experimental in PySpark .. it's pretty much a core change.
   
   > 3) Is this only to support the Job Group or is there other benefits that 
you think we would get from this? (Because if it's just fixing the job group ID 
we might be able to find a simpler way to track that information in the Python 
thread and pass it through each time)?
   
   Seems like there's no simpler way to fix it. Virtually one explicit problem 
found is, yes, the local properties. However, reusing threads (existing way we 
do) can cause multiple potential problems wherever we use thread local or 
inheritable thread local in JVM at the driver side.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to