[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads

2022-04-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527751#comment-17527751
 ] 

Apache Spark commented on SPARK-35739:
--

User 'brandondahler' has created a pull request for this issue:
https://github.com/apache/spark/pull/36343

> [Spark Sql] Add Java-comptable Dataset.join overloads
> -
>
> Key: SPARK-35739
> URL: https://issues.apache.org/jira/browse/SPARK-35739
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Brandon Dahler
>Priority: Minor
>
> h2. Problem
> When using Spark SQL with Java, the required syntax to utilize the following 
> two overloads are unnatural and not obvious to developers that haven't had to 
> interoperate with Scala before:
> {code:java}
> def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame
> def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> Examples:
> Java 11 
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> // Overload with multiple usingColumns, no join type
> dataset1
>   .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2))
>   .show();
> // Overload with multiple usingColumns and a join type
> dataset1
>   .join(
> dataset2,
> JavaConverters.asScalaBuffer(List.of("column", "column2")),
> "left")
>   .show();
> {code}
>  
>  Additionally there is no overload that takes a single usingColumnn and a 
> joinType, forcing the developer to use the Seq[String] overload regardless of 
> language.
> Examples:
> Scala
> {code:java}
> val dataset1 :DataFrame = ...;
> val dataset2 :DataFrame = ...;
> dataset1
>   .join(dataset2, Seq("column"), "left")
>   .show();
> {code}
>  
>  Java 11
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> dataset1
>  .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")), "left")
>  .show();
> {code}
> h2. Proposed Improvement
> Add 3 additional overloads to Dataset:
>   
> {code:java}
> def join(right: Dataset[_], usingColumn: List[String]): DataFrame
> def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame
> def join(right: Dataset[_], usingColumn: List[String], joinType: String): 
> DataFrame
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads

2021-12-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460777#comment-17460777
 ] 

Apache Spark commented on SPARK-35739:
--

User 'brandondahler' has created a pull request for this issue:
https://github.com/apache/spark/pull/34923

> [Spark Sql] Add Java-comptable Dataset.join overloads
> -
>
> Key: SPARK-35739
> URL: https://issues.apache.org/jira/browse/SPARK-35739
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Brandon Dahler
>Priority: Minor
>
> h2. Problem
> When using Spark SQL with Java, the required syntax to utilize the following 
> two overloads are unnatural and not obvious to developers that haven't had to 
> interoperate with Scala before:
> {code:java}
> def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame
> def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> Examples:
> Java 11 
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> // Overload with multiple usingColumns, no join type
> dataset1
>   .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2))
>   .show();
> // Overload with multiple usingColumns and a join type
> dataset1
>   .join(
> dataset2,
> JavaConverters.asScalaBuffer(List.of("column", "column2")),
> "left")
>   .show();
> {code}
>  
>  Additionally there is no overload that takes a single usingColumnn and a 
> joinType, forcing the developer to use the Seq[String] overload regardless of 
> language.
> Examples:
> Scala
> {code:java}
> val dataset1 :DataFrame = ...;
> val dataset2 :DataFrame = ...;
> dataset1
>   .join(dataset2, Seq("column"), "left")
>   .show();
> {code}
>  
>  Java 11
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> dataset1
>  .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")), "left")
>  .show();
> {code}
> h2. Proposed Improvement
> Add 3 additional overloads to Dataset:
>   
> {code:java}
> def join(right: Dataset[_], usingColumn: List[String]): DataFrame
> def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame
> def join(right: Dataset[_], usingColumn: List[String], joinType: String): 
> DataFrame
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads

2021-12-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460775#comment-17460775
 ] 

Apache Spark commented on SPARK-35739:
--

User 'brandondahler' has created a pull request for this issue:
https://github.com/apache/spark/pull/34923

> [Spark Sql] Add Java-comptable Dataset.join overloads
> -
>
> Key: SPARK-35739
> URL: https://issues.apache.org/jira/browse/SPARK-35739
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Brandon Dahler
>Priority: Minor
>
> h2. Problem
> When using Spark SQL with Java, the required syntax to utilize the following 
> two overloads are unnatural and not obvious to developers that haven't had to 
> interoperate with Scala before:
> {code:java}
> def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame
> def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> Examples:
> Java 11 
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> // Overload with multiple usingColumns, no join type
> dataset1
>   .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2))
>   .show();
> // Overload with multiple usingColumns and a join type
> dataset1
>   .join(
> dataset2,
> JavaConverters.asScalaBuffer(List.of("column", "column2")),
> "left")
>   .show();
> {code}
>  
>  Additionally there is no overload that takes a single usingColumnn and a 
> joinType, forcing the developer to use the Seq[String] overload regardless of 
> language.
> Examples:
> Scala
> {code:java}
> val dataset1 :DataFrame = ...;
> val dataset2 :DataFrame = ...;
> dataset1
>   .join(dataset2, Seq("column"), "left")
>   .show();
> {code}
>  
>  Java 11
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> dataset1
>  .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")), "left")
>  .show();
> {code}
> h2. Proposed Improvement
> Add 3 additional overloads to Dataset:
>   
> {code:java}
> def join(right: Dataset[_], usingColumn: List[String]): DataFrame
> def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame
> def join(right: Dataset[_], usingColumn: List[String], joinType: String): 
> DataFrame
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads

2021-07-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379898#comment-17379898
 ] 

Apache Spark commented on SPARK-35739:
--

User 'brandondahler' has created a pull request for this issue:
https://github.com/apache/spark/pull/33323

> [Spark Sql] Add Java-comptable Dataset.join overloads
> -
>
> Key: SPARK-35739
> URL: https://issues.apache.org/jira/browse/SPARK-35739
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Brandon Dahler
>Priority: Minor
>
> h2. Problem
> When using Spark SQL with Java, the required syntax to utilize the following 
> two overloads are unnatural and not obvious to developers that haven't had to 
> interoperate with Scala before:
> {code:java}
> def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame
> def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> Examples:
> Java 11 
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> // Overload with multiple usingColumns, no join type
> dataset1
>   .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2))
>   .show();
> // Overload with multiple usingColumns and a join type
> dataset1
>   .join(
> dataset2,
> JavaConverters.asScalaBuffer(List.of("column", "column2")),
> "left")
>   .show();
> {code}
>  
> Additionally there is no overload that takes a single usingColumnn and a 
> joinType, forcing the developer to use the Seq[String] overload regardless of 
> language.
> Examples:
> Scala
> {code:java}
> val dataset1 :DataFrame = ...;
> val dataset2 :DataFrame = ...;
> dataset1
>   .join(dataset2, Seq("column"))
>   .show();
> {code}
>  
> Java 11
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> dataset1
>  .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")))
>  .show();
> {code}
> h2. Proposed Improvement
> Add 3 additional overloads to Dataset:
>  
> {code:java}
> def join(right: Dataset[_], usingColumn: List[String]): DataFrame
> def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame
> def join(right: Dataset[_], usingColumn: List[String], joinType: String): 
> DataFrame
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads

2021-06-14 Thread Brandon Dahler (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362955#comment-17362955
 ] 

Brandon Dahler commented on SPARK-35739:


I can do array instead of List, note that it'll have to be a normal array 
parameter and not a varargs parameter as the overloads needed aren't compatible 
with a varargs parameter.

I'll plan on starting a PR for it soon.

> [Spark Sql] Add Java-comptable Dataset.join overloads
> -
>
> Key: SPARK-35739
> URL: https://issues.apache.org/jira/browse/SPARK-35739
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Brandon Dahler
>Priority: Minor
>
> h2. Problem
> When using Spark SQL with Java, the required syntax to utilize the following 
> two overloads are unnatural and not obvious to developers that haven't had to 
> interoperate with Scala before:
> {code:java}
> def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame
> def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> Examples:
> Java 11 
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> // Overload with multiple usingColumns, no join type
> dataset1
>   .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2))
>   .show();
> // Overload with multiple usingColumns and a join type
> dataset1
>   .join(
> dataset2,
> JavaConverters.asScalaBuffer(List.of("column", "column2")),
> "left")
>   .show();
> {code}
>  
> Additionally there is no overload that takes a single usingColumnn and a 
> joinType, forcing the developer to use the Seq[String] overload regardless of 
> language.
> Examples:
> Scala
> {code:java}
> val dataset1 :DataFrame = ...;
> val dataset2 :DataFrame = ...;
> dataset1
>   .join(dataset2, Seq("column"))
>   .show();
> {code}
>  
> Java 11
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> dataset1
>  .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")))
>  .show();
> {code}
> h2. Proposed Improvement
> Add 3 additional overloads to Dataset:
>  
> {code:java}
> def join(right: Dataset[_], usingColumn: List[String]): DataFrame
> def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame
> def join(right: Dataset[_], usingColumn: List[String], joinType: String): 
> DataFrame
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads

2021-06-13 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362694#comment-17362694
 ] 

Hyukjin Kwon commented on SPARK-35739:
--

Can you use Array instead of list? otherwise I think it's fine to add them for 
Java users.

> [Spark Sql] Add Java-comptable Dataset.join overloads
> -
>
> Key: SPARK-35739
> URL: https://issues.apache.org/jira/browse/SPARK-35739
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Brandon Dahler
>Priority: Minor
>
> h2. Problem
> When using Spark SQL with Java, the required syntax to utilize the following 
> two overloads are unnatural and not obvious to developers that haven't had to 
> interoperate with Scala before:
> {code:java}
> def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame
> def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> Examples:
> Java 11 
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> // Overload with multiple usingColumns, no join type
> dataset1
>   .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2))
>   .show();
> // Overload with multiple usingColumns and a join type
> dataset1
>   .join(
> dataset2,
> JavaConverters.asScalaBuffer(List.of("column", "column2")),
> "left")
>   .show();
> {code}
>  
> Additionally there is no overload that takes a single usingColumnn and a 
> joinType, forcing the developer to use the Seq[String] overload regardless of 
> language.
> Examples:
> Scala
> {code:java}
> val dataset1 :DataFrame = ...;
> val dataset2 :DataFrame = ...;
> dataset1
>   .join(dataset2, Seq("column"))
>   .show();
> {code}
>  
> Java 11
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> dataset1
>  .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")))
>  .show();
> {code}
> h2. Proposed Improvement
> Add 3 additional overloads to Dataset:
>  
> {code:java}
> def join(right: Dataset[_], usingColumn: List[String]): DataFrame
> def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame
> def join(right: Dataset[_], usingColumn: List[String], joinType: String): 
> DataFrame
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads

2021-06-13 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362695#comment-17362695
 ] 

Hyukjin Kwon commented on SPARK-35739:
--

Feel free to go ahead with a PR if you find some time [~brandon.dahler.amazon]

> [Spark Sql] Add Java-comptable Dataset.join overloads
> -
>
> Key: SPARK-35739
> URL: https://issues.apache.org/jira/browse/SPARK-35739
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Brandon Dahler
>Priority: Minor
>
> h2. Problem
> When using Spark SQL with Java, the required syntax to utilize the following 
> two overloads are unnatural and not obvious to developers that haven't had to 
> interoperate with Scala before:
> {code:java}
> def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame
> def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> Examples:
> Java 11 
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> // Overload with multiple usingColumns, no join type
> dataset1
>   .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2))
>   .show();
> // Overload with multiple usingColumns and a join type
> dataset1
>   .join(
> dataset2,
> JavaConverters.asScalaBuffer(List.of("column", "column2")),
> "left")
>   .show();
> {code}
>  
> Additionally there is no overload that takes a single usingColumnn and a 
> joinType, forcing the developer to use the Seq[String] overload regardless of 
> language.
> Examples:
> Scala
> {code:java}
> val dataset1 :DataFrame = ...;
> val dataset2 :DataFrame = ...;
> dataset1
>   .join(dataset2, Seq("column"))
>   .show();
> {code}
>  
> Java 11
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> dataset1
>  .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")))
>  .show();
> {code}
> h2. Proposed Improvement
> Add 3 additional overloads to Dataset:
>  
> {code:java}
> def join(right: Dataset[_], usingColumn: List[String]): DataFrame
> def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame
> def join(right: Dataset[_], usingColumn: List[String], joinType: String): 
> DataFrame
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org