[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads
[ https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527751#comment-17527751 ] Apache Spark commented on SPARK-35739: -- User 'brandondahler' has created a pull request for this issue: https://github.com/apache/spark/pull/36343 > [Spark Sql] Add Java-comptable Dataset.join overloads > - > > Key: SPARK-35739 > URL: https://issues.apache.org/jira/browse/SPARK-35739 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0, 3.0.0 >Reporter: Brandon Dahler >Priority: Minor > > h2. Problem > When using Spark SQL with Java, the required syntax to utilize the following > two overloads are unnatural and not obvious to developers that haven't had to > interoperate with Scala before: > {code:java} > def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame > def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): > DataFrame > {code} > Examples: > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > // Overload with multiple usingColumns, no join type > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2)) > .show(); > // Overload with multiple usingColumns and a join type > dataset1 > .join( > dataset2, > JavaConverters.asScalaBuffer(List.of("column", "column2")), > "left") > .show(); > {code} > > Additionally there is no overload that takes a single usingColumnn and a > joinType, forcing the developer to use the Seq[String] overload regardless of > language. > Examples: > Scala > {code:java} > val dataset1 :DataFrame = ...; > val dataset2 :DataFrame = ...; > dataset1 > .join(dataset2, Seq("column"), "left") > .show(); > {code} > > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")), "left") > .show(); > {code} > h2. Proposed Improvement > Add 3 additional overloads to Dataset: > > {code:java} > def join(right: Dataset[_], usingColumn: List[String]): DataFrame > def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame > def join(right: Dataset[_], usingColumn: List[String], joinType: String): > DataFrame > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads
[ https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460777#comment-17460777 ] Apache Spark commented on SPARK-35739: -- User 'brandondahler' has created a pull request for this issue: https://github.com/apache/spark/pull/34923 > [Spark Sql] Add Java-comptable Dataset.join overloads > - > > Key: SPARK-35739 > URL: https://issues.apache.org/jira/browse/SPARK-35739 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0, 3.0.0 >Reporter: Brandon Dahler >Priority: Minor > > h2. Problem > When using Spark SQL with Java, the required syntax to utilize the following > two overloads are unnatural and not obvious to developers that haven't had to > interoperate with Scala before: > {code:java} > def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame > def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): > DataFrame > {code} > Examples: > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > // Overload with multiple usingColumns, no join type > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2)) > .show(); > // Overload with multiple usingColumns and a join type > dataset1 > .join( > dataset2, > JavaConverters.asScalaBuffer(List.of("column", "column2")), > "left") > .show(); > {code} > > Additionally there is no overload that takes a single usingColumnn and a > joinType, forcing the developer to use the Seq[String] overload regardless of > language. > Examples: > Scala > {code:java} > val dataset1 :DataFrame = ...; > val dataset2 :DataFrame = ...; > dataset1 > .join(dataset2, Seq("column"), "left") > .show(); > {code} > > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")), "left") > .show(); > {code} > h2. Proposed Improvement > Add 3 additional overloads to Dataset: > > {code:java} > def join(right: Dataset[_], usingColumn: List[String]): DataFrame > def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame > def join(right: Dataset[_], usingColumn: List[String], joinType: String): > DataFrame > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads
[ https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460775#comment-17460775 ] Apache Spark commented on SPARK-35739: -- User 'brandondahler' has created a pull request for this issue: https://github.com/apache/spark/pull/34923 > [Spark Sql] Add Java-comptable Dataset.join overloads > - > > Key: SPARK-35739 > URL: https://issues.apache.org/jira/browse/SPARK-35739 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0, 3.0.0 >Reporter: Brandon Dahler >Priority: Minor > > h2. Problem > When using Spark SQL with Java, the required syntax to utilize the following > two overloads are unnatural and not obvious to developers that haven't had to > interoperate with Scala before: > {code:java} > def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame > def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): > DataFrame > {code} > Examples: > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > // Overload with multiple usingColumns, no join type > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2)) > .show(); > // Overload with multiple usingColumns and a join type > dataset1 > .join( > dataset2, > JavaConverters.asScalaBuffer(List.of("column", "column2")), > "left") > .show(); > {code} > > Additionally there is no overload that takes a single usingColumnn and a > joinType, forcing the developer to use the Seq[String] overload regardless of > language. > Examples: > Scala > {code:java} > val dataset1 :DataFrame = ...; > val dataset2 :DataFrame = ...; > dataset1 > .join(dataset2, Seq("column"), "left") > .show(); > {code} > > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")), "left") > .show(); > {code} > h2. Proposed Improvement > Add 3 additional overloads to Dataset: > > {code:java} > def join(right: Dataset[_], usingColumn: List[String]): DataFrame > def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame > def join(right: Dataset[_], usingColumn: List[String], joinType: String): > DataFrame > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads
[ https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379898#comment-17379898 ] Apache Spark commented on SPARK-35739: -- User 'brandondahler' has created a pull request for this issue: https://github.com/apache/spark/pull/33323 > [Spark Sql] Add Java-comptable Dataset.join overloads > - > > Key: SPARK-35739 > URL: https://issues.apache.org/jira/browse/SPARK-35739 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0, 3.0.0 >Reporter: Brandon Dahler >Priority: Minor > > h2. Problem > When using Spark SQL with Java, the required syntax to utilize the following > two overloads are unnatural and not obvious to developers that haven't had to > interoperate with Scala before: > {code:java} > def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame > def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): > DataFrame > {code} > Examples: > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > // Overload with multiple usingColumns, no join type > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2)) > .show(); > // Overload with multiple usingColumns and a join type > dataset1 > .join( > dataset2, > JavaConverters.asScalaBuffer(List.of("column", "column2")), > "left") > .show(); > {code} > > Additionally there is no overload that takes a single usingColumnn and a > joinType, forcing the developer to use the Seq[String] overload regardless of > language. > Examples: > Scala > {code:java} > val dataset1 :DataFrame = ...; > val dataset2 :DataFrame = ...; > dataset1 > .join(dataset2, Seq("column")) > .show(); > {code} > > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column"))) > .show(); > {code} > h2. Proposed Improvement > Add 3 additional overloads to Dataset: > > {code:java} > def join(right: Dataset[_], usingColumn: List[String]): DataFrame > def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame > def join(right: Dataset[_], usingColumn: List[String], joinType: String): > DataFrame > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads
[ https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362955#comment-17362955 ] Brandon Dahler commented on SPARK-35739: I can do array instead of List, note that it'll have to be a normal array parameter and not a varargs parameter as the overloads needed aren't compatible with a varargs parameter. I'll plan on starting a PR for it soon. > [Spark Sql] Add Java-comptable Dataset.join overloads > - > > Key: SPARK-35739 > URL: https://issues.apache.org/jira/browse/SPARK-35739 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0, 3.0.0 >Reporter: Brandon Dahler >Priority: Minor > > h2. Problem > When using Spark SQL with Java, the required syntax to utilize the following > two overloads are unnatural and not obvious to developers that haven't had to > interoperate with Scala before: > {code:java} > def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame > def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): > DataFrame > {code} > Examples: > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > // Overload with multiple usingColumns, no join type > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2)) > .show(); > // Overload with multiple usingColumns and a join type > dataset1 > .join( > dataset2, > JavaConverters.asScalaBuffer(List.of("column", "column2")), > "left") > .show(); > {code} > > Additionally there is no overload that takes a single usingColumnn and a > joinType, forcing the developer to use the Seq[String] overload regardless of > language. > Examples: > Scala > {code:java} > val dataset1 :DataFrame = ...; > val dataset2 :DataFrame = ...; > dataset1 > .join(dataset2, Seq("column")) > .show(); > {code} > > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column"))) > .show(); > {code} > h2. Proposed Improvement > Add 3 additional overloads to Dataset: > > {code:java} > def join(right: Dataset[_], usingColumn: List[String]): DataFrame > def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame > def join(right: Dataset[_], usingColumn: List[String], joinType: String): > DataFrame > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads
[ https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362694#comment-17362694 ] Hyukjin Kwon commented on SPARK-35739: -- Can you use Array instead of list? otherwise I think it's fine to add them for Java users. > [Spark Sql] Add Java-comptable Dataset.join overloads > - > > Key: SPARK-35739 > URL: https://issues.apache.org/jira/browse/SPARK-35739 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0, 3.0.0 >Reporter: Brandon Dahler >Priority: Minor > > h2. Problem > When using Spark SQL with Java, the required syntax to utilize the following > two overloads are unnatural and not obvious to developers that haven't had to > interoperate with Scala before: > {code:java} > def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame > def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): > DataFrame > {code} > Examples: > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > // Overload with multiple usingColumns, no join type > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2)) > .show(); > // Overload with multiple usingColumns and a join type > dataset1 > .join( > dataset2, > JavaConverters.asScalaBuffer(List.of("column", "column2")), > "left") > .show(); > {code} > > Additionally there is no overload that takes a single usingColumnn and a > joinType, forcing the developer to use the Seq[String] overload regardless of > language. > Examples: > Scala > {code:java} > val dataset1 :DataFrame = ...; > val dataset2 :DataFrame = ...; > dataset1 > .join(dataset2, Seq("column")) > .show(); > {code} > > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column"))) > .show(); > {code} > h2. Proposed Improvement > Add 3 additional overloads to Dataset: > > {code:java} > def join(right: Dataset[_], usingColumn: List[String]): DataFrame > def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame > def join(right: Dataset[_], usingColumn: List[String], joinType: String): > DataFrame > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads
[ https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362695#comment-17362695 ] Hyukjin Kwon commented on SPARK-35739: -- Feel free to go ahead with a PR if you find some time [~brandon.dahler.amazon] > [Spark Sql] Add Java-comptable Dataset.join overloads > - > > Key: SPARK-35739 > URL: https://issues.apache.org/jira/browse/SPARK-35739 > Project: Spark > Issue Type: Improvement > Components: Java API, SQL >Affects Versions: 2.0.0, 3.0.0 >Reporter: Brandon Dahler >Priority: Minor > > h2. Problem > When using Spark SQL with Java, the required syntax to utilize the following > two overloads are unnatural and not obvious to developers that haven't had to > interoperate with Scala before: > {code:java} > def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame > def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): > DataFrame > {code} > Examples: > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > // Overload with multiple usingColumns, no join type > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2)) > .show(); > // Overload with multiple usingColumns and a join type > dataset1 > .join( > dataset2, > JavaConverters.asScalaBuffer(List.of("column", "column2")), > "left") > .show(); > {code} > > Additionally there is no overload that takes a single usingColumnn and a > joinType, forcing the developer to use the Seq[String] overload regardless of > language. > Examples: > Scala > {code:java} > val dataset1 :DataFrame = ...; > val dataset2 :DataFrame = ...; > dataset1 > .join(dataset2, Seq("column")) > .show(); > {code} > > Java 11 > {code:java} > Dataset dataset1 = ...; > Dataset dataset2 = ...; > dataset1 > .join(dataset2, JavaConverters.asScalaBuffer(List.of("column"))) > .show(); > {code} > h2. Proposed Improvement > Add 3 additional overloads to Dataset: > > {code:java} > def join(right: Dataset[_], usingColumn: List[String]): DataFrame > def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame > def join(right: Dataset[_], usingColumn: List[String], joinType: String): > DataFrame > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org