[ https://issues.apache.org/jira/browse/SPARK-52854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sandy Ryza updated SPARK-52854: ------------------------------- Parent: SPARK-52856 Issue Type: Sub-task (was: Improvement) > Prevent setting catalog and database on the session within Pipelines Python > definition files > -------------------------------------------------------------------------------------------- > > Key: SPARK-52854 > URL: https://issues.apache.org/jira/browse/SPARK-52854 > Project: Spark > Issue Type: Sub-task > Components: Declarative Pipelines > Affects Versions: 4.1.0 > Reporter: Sandy Ryza > Priority: Major > > Setting the spark session default catalog and database is an imperative > construct that can cause friction and unexpected behavior from within a > pipeline declaration. E.g. it makes pipeline behavior sensitive to the order > that Python files are imported in, which can be unpredictable. There are > already existing mechanisms for setting Spark catalog and database for > pipelines: > * The catalog and database settings in the pipeline spec > * The name argument on the dataset decorators accepts a fully-qualified name > Raising an error when someone tries to invoke to set a catalog or database in > this situation would avoid this unpredictable behavior. > > The ways to set the catalog and database from Python are: > * spark.catalog.setCurrentCatalog > * spark.sql("USE CATALOG") > * spark.catalog.setCurrentDatabase > * spark.sql("USE DATABASE") > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org