anew commented on code in PR #55493:
URL: https://github.com/apache/spark/pull/55493#discussion_r3270618887


##########
docs/declarative-pipelines-programming-guide.md:
##########
@@ -180,6 +180,20 @@ Your pipelines implemented with the Python API must import 
this module. It's rec
 from pyspark import pipelines as dp
 ```
 
+### The Spark Session in Python Pipelines
+
+The Spark session is automatically injected by the pipeline framework and is 
available as `spark` in every Python pipeline file — no initialization code is 
required. You can use `spark` directly without importing or constructing a 
`SparkSession`:
+
+```python
+from pyspark import pipelines as dp
+
[email protected]_view
+def my_view():
+    return spark.range(10)
+```
+
+Previous versions of Declarative Pipelines required explicitly assigning the 
session with `spark = SparkSession.active()` at the top of each pipeline file. 
This is still allowed and continues to work correctly. However, if you do 
assign the session explicitly, `SparkSession.active()` is the only supported 
way to do so — any other method of obtaining or constructing a `SparkSession` 
is unsupported and may lead to unexpected behavior.

Review Comment:
   Good suggestion, I made the change, however, I was hoping that this could go 
into 4.2 and worded accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to