Jakub Wozniak created SPARK-45214: ------------------------------------- Summary: Columns should not be visible for filter after projection Key: SPARK-45214 URL: https://issues.apache.org/jira/browse/SPARK-45214 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.1 Reporter: Jakub Wozniak
Columns are visible for filtering but not for select after projection. Moreover the behaviour is different when after a union (in this case columns are not visible for filtering anymore). {code:java} from pyspark.sql import SparkSession from pyspark.sql.types import * data1 = [] data2 = [] for i in range(2): data1.append( (1,i) ) data2.append( (2,i+10)) schema1 = StructType([ StructField('f1', IntegerType(), True), StructField('f2', IntegerType(), True) ]) df1 = spark.createDataFrame(data1, schema1) df2 = spark.createDataFrame(data2, schema1) df1.show() df2.show() #works, f1 is available for filter (though it should not be) df1.select('f2').where('f1=1').show() #error, f1 is not available df1.select('f2').union(df2.select('f2')).where('f1=1').show() #this is semantically not symmetric -> incorrect. {code} This is similar to this one: https://issues.apache.org/jira/browse/SPARK-30421 Perhaps it gives a bit more argumentation why this should be fixed as it is logically not correct. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org