[ https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-35739: ------------------------------------ Assignee: Apache Spark > [Spark Sql] Add Java-comptable Dataset.join overloads > ----------------------------------------------------- > > Key: SPARK-35739 > URL: https://issues.apache.org/jira/browse/SPARK-35739 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL > Affects Versions: 2.0.0, 3.0.0 > Reporter: Brandon Dahler > Assignee: Apache Spark > Priority: Minor > > h2. Problem > When using Spark SQL with Java, the required syntax to utilize the following > two overloads are unnatural and not obvious to developers that haven't had to > interoperate with Scala before: > {code:java} > def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame > def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): > DataFrame > {code} > Examples: > Java 11 > {code:java} > Dataset<Row> dataset1 = ...; > Dataset<Row> dataset2 = ...; > // Overload with multiple usingColumns, no join type > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2)) > .show(); > // Overload with multiple usingColumns and a join type > dataset1 > .join( > dataset2, > JavaConverters.asScalaBuffer(List.of("column", "column2")), > "left") > .show(); > {code} > > Additionally there is no overload that takes a single usingColumnn and a > joinType, forcing the developer to use the Seq[String] overload regardless of > language. > Examples: > Scala > {code:java} > val dataset1 :DataFrame = ...; > val dataset2 :DataFrame = ...; > dataset1 > .join(dataset2, Seq("column")) > .show(); > {code} > > Java 11 > {code:java} > Dataset<Row> dataset1 = ...; > Dataset<Row> dataset2 = ...; > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column"))) > .show(); > {code} > h2. Proposed Improvement > Add 3 additional overloads to Dataset: > > {code:java} > def join(right: Dataset[_], usingColumn: List[String]): DataFrame > def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame > def join(right: Dataset[_], usingColumn: List[String], joinType: String): > DataFrame > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org