[ 
https://issues.apache.org/jira/browse/SPARK-52854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated SPARK-52854:
-------------------------------
        Parent: SPARK-52856
    Issue Type: Sub-task  (was: Improvement)

> Prevent setting catalog and database on the session within Pipelines Python 
> definition files
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-52854
>                 URL: https://issues.apache.org/jira/browse/SPARK-52854
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Declarative Pipelines
>    Affects Versions: 4.1.0
>            Reporter: Sandy Ryza
>            Priority: Major
>
> Setting the spark session default catalog and database is an imperative 
> construct that can cause friction and unexpected behavior from within a 
> pipeline declaration. E.g. it makes pipeline behavior sensitive to the order 
> that Python files are imported in, which can be unpredictable. There are 
> already existing mechanisms for setting Spark catalog and database for 
> pipelines:
>  * The catalog and database settings in the pipeline spec
>  * The name argument on the dataset decorators accepts a fully-qualified name
> Raising an error when someone tries to invoke to set a catalog or database in 
> this situation would avoid this unpredictable behavior.
>  
> The ways to set the catalog and database from Python are:
>  * spark.catalog.setCurrentCatalog
>  * spark.sql("USE CATALOG")
>  * spark.catalog.setCurrentDatabase
>  * spark.sql("USE DATABASE")
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to