[jira] [Commented] (SPARK-37954) old columns should not be available after select or drop

L. C. Hsieh (Jira) Thu, 10 Feb 2022 18:34:06 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-37954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490639#comment-17490639
 ]


L. C. Hsieh commented on SPARK-37954:
-------------------------------------

I quickly scanned what we did and it looks like intentional to resolve the 
missing references for Filter. To separately consider the two cases

df.select(df("name")).filter(df("id") === 0).show()  

and

df = df.select(df("name"))
df = df.filter(df("id") === 0).show()  

seems a bit weird idea. I don't think/remember we expect they are two different 
things.



> old columns should not be available after select or drop
> --------------------------------------------------------
>
>                 Key: SPARK-37954
>                 URL: https://issues.apache.org/jira/browse/SPARK-37954
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 3.0.1
>            Reporter: Jean Bon
>            Priority: Major
>
>  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import col as col
> spark = SparkSession.builder.appName('available_columns').getOrCreate()
> df = spark.range(5).select((col("id")+10).alias("id2"))
> assert df.columns==["id2"] #OK
> try:
>     df.select("id")
>     error_raise = False
> except:
>     error_raise = True
> assert error_raise #OK
> df = df.drop("id") #should raise an error
> df.filter(col("id")!=2).count() #returns 4, should raise an error
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37954) old columns should not be available after select or drop

Reply via email to