[ https://issues.apache.org/jira/browse/SPARK-30957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049890#comment-17049890 ]
Hyukjin Kwon commented on SPARK-30957: -------------------------------------- I currently don't think this is particularly useful. We don't have method likes joinSelf or joinRange which is too verbose. The workarounds look easy enough. > Null-safe variant of Dataset.join(Dataset[_], Seq[String]) > ---------------------------------------------------------- > > Key: SPARK-30957 > URL: https://issues.apache.org/jira/browse/SPARK-30957 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: Enrico Minack > Priority: Major > > The {{Dataset.join(Dataset, Seq[String])}} method provides extra convenience > over {{Dataset.join(Dataset, joinExprs: Column)}} as it does not duplicate > the join columns {{Seq[String]}} in the result {{DataFrame}}. Those columns > are compared with {{===}}. When those join columns need to be compared > null-safe with {{<=>}}, the join condition becomes very verbose and requires > extra {{drop}} operations: > {code:java} > df1.join(df2, df1("a") <=> df2("a") && df1("b") <=> > df2("b")).drop(df2("a")).drop(df2("b")).show() > {code} > Elegant would be the following null-safe join operation: > {code:java} > df1.joinNullSafe(df2, joinColumns) > {code} > Possible namings: > - {{Dataset.joinNullSafe(Dataset[_], Seq[String])}} > - {{Dataset.joinWithNulls(Dataset[_], Seq[String])}} > - {{Dataset.join(Dataset[_], Seq[String], <=>)}} > *I am happy to provide a PR if this Dataset API extension is appreciated.* > This request has been sent to the Apache Spark user and > [dev|http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-dataframe-null-safe-joins-given-a-list-of-columns-tt28842.html] > mailing list by Marcelo Valle. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org