Kristin Cowalcijk created SEDONA-706:
----------------------------------------
Summary: Python DataFrame API have problem working in
multi-threaded environment
Key: SEDONA-706
URL: https://issues.apache.org/jira/browse/SEDONA-706
Project: Apache Sedona
Issue Type: Bug
Reporter: Kristin Cowalcijk
Fix For: 1.7.1
This issue is reported by
[https://github.com/apache/sedona/issues/1771|https://github.com/apache/sedona/issues/1771].
The user wanted to call ST functions using DataFrame API but an exception was
raised.
Further investigation showed that DataFrame API relies on
{{SparkSession.getActiveSession}} to construct Spark SQL UDF calls. The "active
session" is thread local and {{SparkSession.getActiveSession}} will only return
a valid session in the thread that starts the Spark session. I believe that the
Python backend is handling requests in a different thread so that thread has no
active session.
What we need for calling sedona function is a JVMView object. We can obtain
this object from {{SparkContext._jvm}} instead of {{spark._jvm}}. This won't
use any thread local states and will work correctly when there's an active
Spark context in the current process.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)