Robin created SPARK-37690: ----------------------------- Summary: Recursive view `df` detected (cycle: `df` -> `df`) Key: SPARK-37690 URL: https://issues.apache.org/jira/browse/SPARK-37690 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.2.0 Reporter: Robin
In Spark 3.2.0, you can no longer reuse the same name for a temporary view. This change is backwards incompatible, and means a common way of running pipelines of SQL queries no longer works. The following is a simple reproducible example that works in Spark 2.x and 3.1.2, but not in 3.2.0: {code:python} from pyspark.context import SparkContext from pyspark.sql import SparkSession sc = SparkContext.getOrCreate() spark = SparkSession(sc) sql = """ SELECT id as col_1, rand() AS col_2 FROM RANGE(10); """ df = spark.sql(sql) df.createOrReplaceTempView("df") sql = """ SELECT * FROM df """ df = spark.sql(sql) df.createOrReplaceTempView("df") sql = """ SELECT * FROM df """ df = spark.sql(sql) {code} The following error is now produced: {code:python} AnalysisException: Recursive view `df` detected (cycle: `df` -> `df`) {code} I'm reasonably sure this change is unintentional in 3.2.0 since it breaks a lot of legacy code, and the `createOrReplaceTempView` method is named explicitly such that replacing an existing view should be allowed. An internet search suggests other users have run into a similar problems, e.g. [here|https://community.databricks.com/s/question/0D53f00001Qugr7CAB/upgrading-from-spark-24-to-32-recursive-view-errors-when-using] -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org