[jira] [Created] (SPARK-45214) Columns should not be visible for filter after projection

Jakub Wozniak (Jira) Tue, 19 Sep 2023 01:50:04 -0700

Jakub Wozniak created SPARK-45214:
-------------------------------------

             Summary: Columns should not be visible for filter after projection
                 Key: SPARK-45214
                 URL: https://issues.apache.org/jira/browse/SPARK-45214
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.1
            Reporter: Jakub Wozniak



Columns are visible for filtering but not for select after projection. Moreover 
the behaviour is different when after a union (in this case columns are not 
visible for filtering anymore).
{code:java}
from pyspark.sql import SparkSession
from pyspark.sql.types import *

data1 = []
data2 = []

for i in range(2): 
    data1.append( (1,i) )
    data2.append( (2,i+10))



schema1 = StructType([
        StructField('f1', IntegerType(), True),
         StructField('f2', IntegerType(), True)
])



df1 = spark.createDataFrame(data1, schema1)
df2 = spark.createDataFrame(data2, schema1)


df1.show()
df2.show()


#works, f1 is available for filter (though it should not be)
df1.select('f2').where('f1=1').show()

#error, f1 is not available
df1.select('f2').union(df2.select('f2')).where('f1=1').show()

#this is semantically not symmetric -> incorrect. 

{code}

This is similar to this one: https://issues.apache.org/jira/browse/SPARK-30421
Perhaps it gives a bit more argumentation why this should be fixed as it is 
logically not correct. 




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45214) Columns should not be visible for filter after projection

Reply via email to