[ https://issues.apache.org/jira/browse/SPARK-20463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Styles updated SPARK-20463: ----------------------------------- Description: Add support for the SQL standard distinct predicate to SPARK SQL. {noformat} <expression> IS [NOT] DISTINCT FROM <expression> {noformat} {noformat} data = [(10, 20), (30, 30), (40, None), (None, None)] df = sc.parallelize(data).toDF(["c1", "c2"]) df.createTempView("df") spark.sql("select c1, c2 from df where c1 is not distinct from c2").collect() [Row(c1=30, c2=30), Row(c1=None, c2=None)] {noformat} was: Expose the SPARK SQL '<=>' operator in Pyspark as a column function called *isNotDistinctFrom*. For example: {panel} {noformat} data = [(10, 20), (30, 30), (40, None), (None, None)] df2 = sc.parallelize(data).toDF("c1", "c2") df2.where(df2["c1"].isNotDistinctFrom(df2["c2"]).collect()) [Row(c1=30, c2=30), Row(c1=None, c2=None)] {noformat} {panel} > Add support for IS [NOT] DISTINCT FROM to SPARK SQL > --------------------------------------------------- > > Key: SPARK-20463 > URL: https://issues.apache.org/jira/browse/SPARK-20463 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.1.0 > Reporter: Michael Styles > > Add support for the SQL standard distinct predicate to SPARK SQL. > {noformat} > <expression> IS [NOT] DISTINCT FROM <expression> > {noformat} > {noformat} > data = [(10, 20), (30, 30), (40, None), (None, None)] > df = sc.parallelize(data).toDF(["c1", "c2"]) > df.createTempView("df") > spark.sql("select c1, c2 from df where c1 is not distinct from c2").collect() > [Row(c1=30, c2=30), Row(c1=None, c2=None)] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org