[ https://issues.apache.org/jira/browse/SPARK-41379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643104#comment-17643104 ]
Apache Spark commented on SPARK-41379: -------------------------------------- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/38906 > Inconsistency of spark session in DataFrame in user function for foreachBatch > sink in PySpark > --------------------------------------------------------------------------------------------- > > Key: SPARK-41379 > URL: https://issues.apache.org/jira/browse/SPARK-41379 > Project: Spark > Issue Type: Bug > Components: PySpark, Structured Streaming > Affects Versions: 3.3.2, 3.4.0 > Reporter: Jungtaek Lim > Priority: Major > > [https://docs.databricks.com/_static/notebooks/merge-in-streaming.html] > According to some manual testing against above code example in PySpark, it > seems like the property of sparkSession in given DataFrame is not the same > with cloned session in streaming query. In other words, {{df.sparkSession}} > does not seem to be same with the cloned spark session which you can access > via {{{}df._jdf.sparkSession(){}}}. > So which session to pick depends on the actual implementation of method in > PySpark DataFrame, which users would never know. If it leads to pick the > different session than expected, it leads to open backdoor for avoiding > restrictions (e.g. AQE), unable to see session scoped resources (e.g. temp > view), etc. > So it’s quite critical to sync two sessions to refer the same. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org