[jira] [Commented] (SPARK-37690) Recursive view `df` detected (cycle: `df` -> `df`)
[ https://issues.apache.org/jira/browse/SPARK-37690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578901#comment-17578901 ] Daniel Darabos commented on SPARK-37690: It's fixed in Spark 3.3.0. (https://github.com/apache/spark/commit/1d068cef38f2323967be83045118cef0e537e8dc) Does upgrading count as a workaround? Or on 3.2 you can avoid the cycle error by saving the new table under a new name. > Recursive view `df` detected (cycle: `df` -> `df`) > -- > > Key: SPARK-37690 > URL: https://issues.apache.org/jira/browse/SPARK-37690 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Robin >Priority: Major > > In Spark 3.2.0, you can no longer reuse the same name for a temporary view. > This change is backwards incompatible, and means a common way of running > pipelines of SQL queries no longer works. The following is a simple > reproducible example that works in Spark 2.x and 3.1.2, but not in 3.2.0: > {code:python}from pyspark.context import SparkContext > from pyspark.sql import SparkSession > sc = SparkContext.getOrCreate() > spark = SparkSession(sc) > sql = """ SELECT id as col_1, rand() AS col_2 FROM RANGE(10); """ > df = spark.sql(sql) > df.createOrReplaceTempView("df") > sql = """ SELECT * FROM df """ > df = spark.sql(sql) > df.createOrReplaceTempView("df") > sql = """ SELECT * FROM df """ > df = spark.sql(sql) {code} > The following error is now produced: > {code:python}AnalysisException: Recursive view `df` detected (cycle: `df` -> > `df`) > {code} > I'm reasonably sure this change is unintentional in 3.2.0 since it breaks a > lot of legacy code, and the `createOrReplaceTempView` method is named > explicitly such that replacing an existing view should be allowed. An > internet search suggests other users have run into a similar problems, e.g. > [here|https://community.databricks.com/s/question/0D53f1Qugr7CAB/upgrading-from-spark-24-to-32-recursive-view-errors-when-using] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37690) Recursive view `df` detected (cycle: `df` -> `df`)
[ https://issues.apache.org/jira/browse/SPARK-37690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578422#comment-17578422 ] Z. S. commented on SPARK-37690: --- If anyone figures out a workaround please let us know.. > Recursive view `df` detected (cycle: `df` -> `df`) > -- > > Key: SPARK-37690 > URL: https://issues.apache.org/jira/browse/SPARK-37690 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Robin >Priority: Major > > In Spark 3.2.0, you can no longer reuse the same name for a temporary view. > This change is backwards incompatible, and means a common way of running > pipelines of SQL queries no longer works. The following is a simple > reproducible example that works in Spark 2.x and 3.1.2, but not in 3.2.0: > {code:python}from pyspark.context import SparkContext > from pyspark.sql import SparkSession > sc = SparkContext.getOrCreate() > spark = SparkSession(sc) > sql = """ SELECT id as col_1, rand() AS col_2 FROM RANGE(10); """ > df = spark.sql(sql) > df.createOrReplaceTempView("df") > sql = """ SELECT * FROM df """ > df = spark.sql(sql) > df.createOrReplaceTempView("df") > sql = """ SELECT * FROM df """ > df = spark.sql(sql) {code} > The following error is now produced: > {code:python}AnalysisException: Recursive view `df` detected (cycle: `df` -> > `df`) > {code} > I'm reasonably sure this change is unintentional in 3.2.0 since it breaks a > lot of legacy code, and the `createOrReplaceTempView` method is named > explicitly such that replacing an existing view should be allowed. An > internet search suggests other users have run into a similar problems, e.g. > [here|https://community.databricks.com/s/question/0D53f1Qugr7CAB/upgrading-from-spark-24-to-32-recursive-view-errors-when-using] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37690) Recursive view `df` detected (cycle: `df` -> `df`)
[ https://issues.apache.org/jira/browse/SPARK-37690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502743#comment-17502743 ] Yishai Chernovitzky commented on SPARK-37690: - It seems to be resolved by SPARK-38318 > Recursive view `df` detected (cycle: `df` -> `df`) > -- > > Key: SPARK-37690 > URL: https://issues.apache.org/jira/browse/SPARK-37690 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Robin >Priority: Major > > In Spark 3.2.0, you can no longer reuse the same name for a temporary view. > This change is backwards incompatible, and means a common way of running > pipelines of SQL queries no longer works. The following is a simple > reproducible example that works in Spark 2.x and 3.1.2, but not in 3.2.0: > {code:python}from pyspark.context import SparkContext > from pyspark.sql import SparkSession > sc = SparkContext.getOrCreate() > spark = SparkSession(sc) > sql = """ SELECT id as col_1, rand() AS col_2 FROM RANGE(10); """ > df = spark.sql(sql) > df.createOrReplaceTempView("df") > sql = """ SELECT * FROM df """ > df = spark.sql(sql) > df.createOrReplaceTempView("df") > sql = """ SELECT * FROM df """ > df = spark.sql(sql) {code} > The following error is now produced: > {code:python}AnalysisException: Recursive view `df` detected (cycle: `df` -> > `df`) > {code} > I'm reasonably sure this change is unintentional in 3.2.0 since it breaks a > lot of legacy code, and the `createOrReplaceTempView` method is named > explicitly such that replacing an existing view should be allowed. An > internet search suggests other users have run into a similar problems, e.g. > [here|https://community.databricks.com/s/question/0D53f1Qugr7CAB/upgrading-from-spark-24-to-32-recursive-view-errors-when-using] > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37690) Recursive view `df` detected (cycle: `df` -> `df`)
[ https://issues.apache.org/jira/browse/SPARK-37690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17478861#comment-17478861 ] Kiran commented on SPARK-37690: --- Got this issue with spark 3.2.0. Looking for workarounds but none worked as of now. > Recursive view `df` detected (cycle: `df` -> `df`) > -- > > Key: SPARK-37690 > URL: https://issues.apache.org/jira/browse/SPARK-37690 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Robin >Priority: Major > > In Spark 3.2.0, you can no longer reuse the same name for a temporary view. > This change is backwards incompatible, and means a common way of running > pipelines of SQL queries no longer works. The following is a simple > reproducible example that works in Spark 2.x and 3.1.2, but not in 3.2.0: > {code:python}from pyspark.context import SparkContext > from pyspark.sql import SparkSession > sc = SparkContext.getOrCreate() > spark = SparkSession(sc) > sql = """ SELECT id as col_1, rand() AS col_2 FROM RANGE(10); """ > df = spark.sql(sql) > df.createOrReplaceTempView("df") > sql = """ SELECT * FROM df """ > df = spark.sql(sql) > df.createOrReplaceTempView("df") > sql = """ SELECT * FROM df """ > df = spark.sql(sql) {code} > The following error is now produced: > {code:python}AnalysisException: Recursive view `df` detected (cycle: `df` -> > `df`) > {code} > I'm reasonably sure this change is unintentional in 3.2.0 since it breaks a > lot of legacy code, and the `createOrReplaceTempView` method is named > explicitly such that replacing an existing view should be allowed. An > internet search suggests other users have run into a similar problems, e.g. > [here|https://community.databricks.com/s/question/0D53f1Qugr7CAB/upgrading-from-spark-24-to-32-recursive-view-errors-when-using] > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37690) Recursive view `df` detected (cycle: `df` -> `df`)
[ https://issues.apache.org/jira/browse/SPARK-37690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17477141#comment-17477141 ] Daniel Darabos commented on SPARK-37690: We've hit this too with Spark 3.2.0. Could this be fallout from [SPARK-34546|https://issues.apache.org/jira/browse/SPARK-34546]? It changed where the query for views is exactly analyzed, and it was added in 3.2.0. [~imback82], what do you think? Here's a repro for the Scala Spark Shell: {code:java} scala> Seq((1, 2)).toDF.createOrReplaceTempView("x") scala> spark.sql("select * from x").createOrReplaceTempView("x") org.apache.spark.sql.AnalysisException: Recursive view `x` detected (cycle: `x` -> `x`) {code} > Recursive view `df` detected (cycle: `df` -> `df`) > -- > > Key: SPARK-37690 > URL: https://issues.apache.org/jira/browse/SPARK-37690 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Robin >Priority: Major > > In Spark 3.2.0, you can no longer reuse the same name for a temporary view. > This change is backwards incompatible, and means a common way of running > pipelines of SQL queries no longer works. The following is a simple > reproducible example that works in Spark 2.x and 3.1.2, but not in 3.2.0: > {code:python}from pyspark.context import SparkContext > from pyspark.sql import SparkSession > sc = SparkContext.getOrCreate() > spark = SparkSession(sc) > sql = """ SELECT id as col_1, rand() AS col_2 FROM RANGE(10); """ > df = spark.sql(sql) > df.createOrReplaceTempView("df") > sql = """ SELECT * FROM df """ > df = spark.sql(sql) > df.createOrReplaceTempView("df") > sql = """ SELECT * FROM df """ > df = spark.sql(sql) {code} > The following error is now produced: > {code:python}AnalysisException: Recursive view `df` detected (cycle: `df` -> > `df`) > {code} > I'm reasonably sure this change is unintentional in 3.2.0 since it breaks a > lot of legacy code, and the `createOrReplaceTempView` method is named > explicitly such that replacing an existing view should be allowed. An > internet search suggests other users have run into a similar problems, e.g. > [here|https://community.databricks.com/s/question/0D53f1Qugr7CAB/upgrading-from-spark-24-to-32-recursive-view-errors-when-using] > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37690) Recursive view `df` detected (cycle: `df` -> `df`)
[ https://issues.apache.org/jira/browse/SPARK-37690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464354#comment-17464354 ] Robin commented on SPARK-37690: --- Someone [here|https://community.databricks.com/s/question/0D53f1Qugr7CAB/upgrading-from-spark-24-to-32-recursive-view-errors-when-using] has suggested this is an intentional breaking change introduced in Spark 3.1: >From [Migration Guide: SQL, Datasets and DataFrame - Spark 3.1.1 Documentation >(apache.org)|https://spark.apache.org/docs/3.1.1/sql-migration-guide.html]] > In Spark 3.1, the temporary view will have same behaviors with the permanent > view, i.e. capture and store runtime SQL configs, SQL text, catalog and > namespace. The capatured view properties will be applied during the parsing > and analysis phases of the view resolution. To restore the behavior before > Spark 3.1, {*}you can set spark.sql.legacy.storeAnalyzedPlanForView to > true{*}. Grateful if someone could clarify. Worth noting that the example code works in Spark 3.1.2, just not 3.2.0. It's not obvious to me the above quote implies `createOrReplaceTempView` would fail in the example code posted in the issue. > Recursive view `df` detected (cycle: `df` -> `df`) > -- > > Key: SPARK-37690 > URL: https://issues.apache.org/jira/browse/SPARK-37690 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Robin >Priority: Major > > In Spark 3.2.0, you can no longer reuse the same name for a temporary view. > This change is backwards incompatible, and means a common way of running > pipelines of SQL queries no longer works. The following is a simple > reproducible example that works in Spark 2.x and 3.1.2, but not in 3.2.0: > {code:python}from pyspark.context import SparkContext > from pyspark.sql import SparkSession > sc = SparkContext.getOrCreate() > spark = SparkSession(sc) > sql = """ SELECT id as col_1, rand() AS col_2 FROM RANGE(10); """ > df = spark.sql(sql) > df.createOrReplaceTempView("df") > sql = """ SELECT * FROM df """ > df = spark.sql(sql) > df.createOrReplaceTempView("df") > sql = """ SELECT * FROM df """ > df = spark.sql(sql) {code} > The following error is now produced: > {code:python}AnalysisException: Recursive view `df` detected (cycle: `df` -> > `df`) > {code} > I'm reasonably sure this change is unintentional in 3.2.0 since it breaks a > lot of legacy code, and the `createOrReplaceTempView` method is named > explicitly such that replacing an existing view should be allowed. An > internet search suggests other users have run into a similar problems, e.g. > [here|https://community.databricks.com/s/question/0D53f1Qugr7CAB/upgrading-from-spark-24-to-32-recursive-view-errors-when-using] > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org