[GitHub] [spark] HyukjinKwon commented on a change in pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

GitBox Tue, 13 Jul 2021 20:56:55 -0700


HyukjinKwon commented on a change in pull request #33323:
URL: https://github.com/apache/spark/pull/33323#discussion_r669261712




##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -956,6 +956,31 @@ class Dataset[T] private[sql](
     join(right, Seq(usingColumn))
   }
 
+  /**
+   * (Java-specific) Inner equi-join with another `DataFrame` using the given 
columns.
+   *
+   * Different from other join functions, the join columns will only appear 
once in the output,
+   * i.e. similar to SQL's `JOIN USING` syntax.
+   *
+   * {{{
+   *   // Joining df1 and df2 using the columns "user_id" and "user_name"
+   *   df1.join(df2, new String[] {"user_id", "user_name"});
+   * }}}
+   *
+   * @param right Right side of the join operation.
+   * @param usingColumns Names of the columns to join on. This columns must 
exist on both sides.
+   *
+   * @note If you perform a self-join using this function without aliasing the 
input
+   * `DataFrame`s, you will NOT be able to reference any columns after the 
join, since
+   * there is no way to disambiguate which side of the join you would like to 
reference.
+   *
+   * @group untypedrel
+   * @since 3.1.3
+   */
+  def join(right: Dataset[_], usingColumns: Array[String]): DataFrame = {
+    join(right, usingColumns.toSeq)
+  }
+
   /**
    * Inner equi-join with another `DataFrame` using the given columns.

Review comment:
       Can you add "(Scala-specific)" here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

Reply via email to