[ https://issues.apache.org/jira/browse/SPARK-21110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061398#comment-16061398 ]
Michael Armbrust commented on SPARK-21110: ------------------------------------------ It seems if you can call {{min}} and {{max}} on structs you should be able to use comparison operations as well. > Structs should be usable in inequality filters > ---------------------------------------------- > > Key: SPARK-21110 > URL: https://issues.apache.org/jira/browse/SPARK-21110 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.1.1 > Reporter: Nicholas Chammas > Priority: Minor > > It seems like a missing feature that you can't compare structs in a filter on > a DataFrame. > Here's a simple demonstration of a) where this would be useful and b) how > it's different from simply comparing each of the components of the structs. > {code} > import pyspark > from pyspark.sql.functions import col, struct, concat > spark = pyspark.sql.SparkSession.builder.getOrCreate() > df = spark.createDataFrame( > [ > ('Boston', 'Bob'), > ('Boston', 'Nick'), > ('San Francisco', 'Bob'), > ('San Francisco', 'Nick'), > ], > ['city', 'person'] > ) > pairs = ( > df.select( > struct('city', 'person').alias('p1') > ) > .crossJoin( > df.select( > struct('city', 'person').alias('p2') > ) > ) > ) > print("Everything") > pairs.show() > print("Comparing parts separately (doesn't give me what I want)") > (pairs > .where(col('p1.city') < col('p2.city')) > .where(col('p1.person') < col('p2.person')) > .show()) > print("Comparing parts together with concat (gives me what I want but is > hacky)") > (pairs > .where(concat('p1.city', 'p1.person') < concat('p2.city', 'p2.person')) > .show()) > print("Comparing parts together with struct (my desired solution but > currently yields an error)") > (pairs > .where(col('p1') < col('p2')) > .show()) > {code} > The last query yields the following error in Spark 2.1.1: > {code} > org.apache.spark.sql.AnalysisException: cannot resolve '(`p1` < `p2`)' due to > data type mismatch: '(`p1` < `p2`)' requires (boolean or tinyint or smallint > or int or bigint or float or double or decimal or timestamp or date or string > or binary) type, not struct<city:string,person:string>;; > 'Filter (p1#5 < p2#8) > +- Join Cross > :- Project [named_struct(city, city#0, person, person#1) AS p1#5] > : +- LogicalRDD [city#0, person#1] > +- Project [named_struct(city, city#0, person, person#1) AS p2#8] > +- LogicalRDD [city#0, person#1] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org