[ https://issues.apache.org/jira/browse/SPARK-20463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Styles updated SPARK-20463: ----------------------------------- Component/s: (was: PySpark) SQL Summary: Add support for IS [NOT] DISTINCT FROM to SPARK SQL (was: Expose SPARK SQL <=> operator in PySpark) > Add support for IS [NOT] DISTINCT FROM to SPARK SQL > --------------------------------------------------- > > Key: SPARK-20463 > URL: https://issues.apache.org/jira/browse/SPARK-20463 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.1.0 > Reporter: Michael Styles > > Expose the SPARK SQL '<=>' operator in Pyspark as a column function called > *isNotDistinctFrom*. For example: > {panel} > {noformat} > data = [(10, 20), (30, 30), (40, None), (None, None)] > df2 = sc.parallelize(data).toDF("c1", "c2") > df2.where(df2["c1"].isNotDistinctFrom(df2["c2"]).collect()) > [Row(c1=30, c2=30), Row(c1=None, c2=None)] > {noformat} > {panel} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org