Andreas Neumann created SPARK-56572:
---------------------------------------
Summary: Inject the Spark session into python pipeline code
Key: SPARK-56572
URL: https://issues.apache.org/jira/browse/SPARK-56572
Project: Spark
Issue Type: Improvement
Components: Declarative Pipelines
Affects Versions: 4.2.0, 4.1.2
Reporter: Andreas Neumann
In Declarative Pipelines, all Python scripts run as separate modules but have
to share the same Spark session. This session is created by the framework, and
every script needs to bring it into its own scope by declaring
{code:java}
spark = SparkSession.active() {code}
This bears the risk that users might create their own Spark session instead,
and that could break dependencies between different Python scripts of the same
pipeline. It would be better to inject this session directly into the module,
so user code does not need to worry about obtaining it.
Also, change `spark-pipelines init` to omit that line from the generated sample
code.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]