Robin created SPARK-37690:
-----------------------------

             Summary: Recursive view `df` detected (cycle: `df` -> `df`)
                 Key: SPARK-37690
                 URL: https://issues.apache.org/jira/browse/SPARK-37690
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.2.0
            Reporter: Robin


In Spark 3.2.0, you can no longer reuse the same name for a temporary view.  
This change is backwards incompatible, and means a common way of running 
pipelines of SQL queries no longer works.   The following is a simple 
reproducible example that works in Spark 2.x and 3.1.2, but not in 3.2.0: 

{code:python} from pyspark.context import SparkContext from pyspark.sql import 
SparkSession sc = SparkContext.getOrCreate() spark = SparkSession(sc) sql = """ 
SELECT id as col_1, rand() AS col_2 FROM RANGE(10); """ df = spark.sql(sql) 
df.createOrReplaceTempView("df") sql = """ SELECT * FROM df """ df = 
spark.sql(sql) df.createOrReplaceTempView("df") sql = """ SELECT * FROM df """ 
df = spark.sql(sql) {code}   The following error is now produced:   
{code:python} AnalysisException: Recursive view `df` detected (cycle: `df` -> 
`df`) {code} 

I'm reasonably sure this change is unintentional in 3.2.0 since it breaks a lot 
of legacy code, and the `createOrReplaceTempView` method is named explicitly 
such that replacing an existing view should be allowed.   An internet search 
suggests other users have run into a similar problems, e.g. 
[here|https://community.databricks.com/s/question/0D53f00001Qugr7CAB/upgrading-from-spark-24-to-32-recursive-view-errors-when-using]
  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to