[jira] [Commented] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values

2023-07-06 Thread koert kuipers (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740726#comment-17740726
 ] 

koert kuipers commented on SPARK-37829:
---

since this behavior of returning a Row with null values has been present since 
spark 3.0.x (a major breaking release, and 3 years ago) i would argue this is 
the default behavior and this jira introduces a breaking change.

> An outer-join using joinWith on DataFrames returns Rows with null fields 
> instead of null values
> ---
>
> Key: SPARK-37829
> URL: https://issues.apache.org/jira/browse/SPARK-37829
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: Clément de Groc
>Assignee: Jason Xu
>Priority: Major
> Fix For: 3.3.3, 3.4.1, 3.5.0
>
>
> Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
> missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
> {{null}} values in Spark 3+.
> The issue can be reproduced with [the following 
> test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
>  that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
> The problem only arises when working with DataFrames: Datasets of case 
> classes work as expected as demonstrated by [this other 
> test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].
> I couldn't find an explanation for this change in the Migration guide so I'm 
> assuming this is a bug.
> A {{git bisect}} pointed me to [that 
> commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].
> Reverting the commit solves the problem.
> A similar solution,  but without reverting, is shown 
> [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a].
> Happy to help if you think of another approach / can provide some guidance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values

2023-07-06 Thread koert kuipers (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740726#comment-17740726
 ] 

koert kuipers edited comment on SPARK-37829 at 7/6/23 5:23 PM:
---

since this behavior of returning a Row with null values has been present since 
spark 3.0.x (a major breaking release, and 3 years ago) i would argue this is 
the default behavior and this jira introduces a breaking change.

also expressionencoders are used for other purposes than dataset joins and now 
we find nulls popping up in places they should not.


was (Author: koert):
since this behavior of returning a Row with null values has been present since 
spark 3.0.x (a major breaking release, and 3 years ago) i would argue this is 
the default behavior and this jira introduces a breaking change.

> An outer-join using joinWith on DataFrames returns Rows with null fields 
> instead of null values
> ---
>
> Key: SPARK-37829
> URL: https://issues.apache.org/jira/browse/SPARK-37829
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: Clément de Groc
>Assignee: Jason Xu
>Priority: Major
> Fix For: 3.3.3, 3.4.1, 3.5.0
>
>
> Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
> missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
> {{null}} values in Spark 3+.
> The issue can be reproduced with [the following 
> test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
>  that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
> The problem only arises when working with DataFrames: Datasets of case 
> classes work as expected as demonstrated by [this other 
> test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].
> I couldn't find an explanation for this change in the Migration guide so I'm 
> assuming this is a bug.
> A {{git bisect}} pointed me to [that 
> commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].
> Reverting the commit solves the problem.
> A similar solution,  but without reverting, is shown 
> [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a].
> Happy to help if you think of another approach / can provide some guidance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values

2023-07-06 Thread koert kuipers (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740726#comment-17740726
 ] 

koert kuipers edited comment on SPARK-37829 at 7/6/23 5:32 PM:
---

since this (admittedly somewhat weird) behavior of returning a Row with null 
values has been present since spark 3.0.x (a major breaking release, and 3 
years ago) i would argue this is the default behavior and this jira introduces 
a breaking change.

also expressionencoders are used for other purposes than dataset joins and now 
we find nulls popping up in places they should not.


was (Author: koert):
since this behavior of returning a Row with null values has been present since 
spark 3.0.x (a major breaking release, and 3 years ago) i would argue this is 
the default behavior and this jira introduces a breaking change.

also expressionencoders are used for other purposes than dataset joins and now 
we find nulls popping up in places they should not.

> An outer-join using joinWith on DataFrames returns Rows with null fields 
> instead of null values
> ---
>
> Key: SPARK-37829
> URL: https://issues.apache.org/jira/browse/SPARK-37829
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: Clément de Groc
>Assignee: Jason Xu
>Priority: Major
> Fix For: 3.3.3, 3.4.1, 3.5.0
>
>
> Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
> missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
> {{null}} values in Spark 3+.
> The issue can be reproduced with [the following 
> test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
>  that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
> The problem only arises when working with DataFrames: Datasets of case 
> classes work as expected as demonstrated by [this other 
> test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].
> I couldn't find an explanation for this change in the Migration guide so I'm 
> assuming this is a bug.
> A {{git bisect}} pointed me to [that 
> commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].
> Reverting the commit solves the problem.
> A similar solution,  but without reverting, is shown 
> [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a].
> Happy to help if you think of another approach / can provide some guidance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44324) Move CaseInsensitiveMap to sql/api

2023-07-06 Thread Rui Wang (Jira)
Rui Wang created SPARK-44324:


 Summary: Move CaseInsensitiveMap to sql/api
 Key: SPARK-44324
 URL: https://issues.apache.org/jira/browse/SPARK-44324
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, SQL
Affects Versions: 3.5.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-44323) Scala None shows up as null for Aggregator BUF or OUT

2023-07-06 Thread koert kuipers (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740744#comment-17740744
 ] 

koert kuipers edited comment on SPARK-44323 at 7/6/23 6:28 PM:
---

i think the issue is that Nones inside Tuples now become nulls.

so its the usage of nullSafe inside the childrenDeserializers for tuples 
introduced in [https://github.com/apache/spark/pull/40755]


was (Author: koert):
i think the issue is that Nones inside Tuples now become null.

so its the usage of nullSafe inside the childrenDeserializers for tuples 
introduced in https://github.com/apache/spark/pull/40755

> Scala None shows up as null for Aggregator BUF or OUT  
> ---
>
> Key: SPARK-44323
> URL: https://issues.apache.org/jira/browse/SPARK-44323
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: koert kuipers
>Priority: Major
>
> when doing an upgrade from spark 3.3.1 to spark 3.4.1 we suddenly started 
> getting null pointer exceptions in Aggregators (classes extending 
> org.apache.spark.sql.expressions.Aggregator) that use scala Option for BUF 
> and/or OUT. basically None is now showing up as null.
> after adding a simple test case and doing a binary search on commits we 
> landed on SPARK-37829 being the cause.
> we observed the issue at first with NPE inside Aggregator.merge because None 
> was null. i am having a hard time replicating that in a spark unit test, but 
> i did manage to get a None become null in the output. simple test that now 
> fails:
>  
> {code:java}
> diff --git 
> a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala 
> b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
> index e9daa825dd4..a1959d7065d 100644
> --- 
> a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
> +++ 
> b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
> @@ -228,6 +228,16 @@ case class FooAgg(s: Int) extends Aggregator[Row, Int, 
> Int] {
>    def outputEncoder: Encoder[Int] = Encoders.scalaInt
>  }
>  
> +object OptionStringAgg extends Aggregator[Option[String], Option[String], 
> Option[String]] {
> +  override def zero: Option[String] = None
> +  override def reduce(b: Option[String], a: Option[String]): Option[String] 
> = merge(b, a)
> +  override def finish(reduction: Option[String]): Option[String] = reduction
> +  override def merge(b1: Option[String], b2: Option[String]): Option[String] 
> =
> +    b1.map{ b1v => b2.map{ b2v => b1v ++ b2v }.getOrElse(b1v) }.orElse(b2)
> +  override def bufferEncoder: Encoder[Option[String]] = ExpressionEncoder()
> +  override def outputEncoder: Encoder[Option[String]] = ExpressionEncoder()
> +}
> +
>  class DatasetAggregatorSuite extends QueryTest with SharedSparkSession {
>    import testImplicits._
>  
> @@ -432,4 +442,15 @@ class DatasetAggregatorSuite extends QueryTest with 
> SharedSparkSession {
>      val agg = df.select(mode(col("a"))).as[String]
>      checkDataset(agg, "3")
>    }
> +
> +  test("typed aggregation: option string") {
> +    val ds = Seq((1, Some("a")), (1, None), (1, Some("c")), (2, None)).toDS()
> +
> +    checkDataset(
> +      ds.groupByKey(_._1).mapValues(_._2).agg(
> +        OptionStringAgg.toColumn
> +      ),
> +      (1, Some("ac")), (2, None)
> +    )
> +  }
>  }
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44323) Scala None shows up as null for Aggregator BUF or OUT

2023-07-06 Thread koert kuipers (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740744#comment-17740744
 ] 

koert kuipers commented on SPARK-44323:
---

i think the issue is that Nones inside Tuples now become null.

so its the usage of nullSafe inside the childrenDeserializers for tuples 
introduced in https://github.com/apache/spark/pull/40755

> Scala None shows up as null for Aggregator BUF or OUT  
> ---
>
> Key: SPARK-44323
> URL: https://issues.apache.org/jira/browse/SPARK-44323
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: koert kuipers
>Priority: Major
>
> when doing an upgrade from spark 3.3.1 to spark 3.4.1 we suddenly started 
> getting null pointer exceptions in Aggregators (classes extending 
> org.apache.spark.sql.expressions.Aggregator) that use scala Option for BUF 
> and/or OUT. basically None is now showing up as null.
> after adding a simple test case and doing a binary search on commits we 
> landed on SPARK-37829 being the cause.
> we observed the issue at first with NPE inside Aggregator.merge because None 
> was null. i am having a hard time replicating that in a spark unit test, but 
> i did manage to get a None become null in the output. simple test that now 
> fails:
>  
> {code:java}
> diff --git 
> a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala 
> b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
> index e9daa825dd4..a1959d7065d 100644
> --- 
> a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
> +++ 
> b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
> @@ -228,6 +228,16 @@ case class FooAgg(s: Int) extends Aggregator[Row, Int, 
> Int] {
>    def outputEncoder: Encoder[Int] = Encoders.scalaInt
>  }
>  
> +object OptionStringAgg extends Aggregator[Option[String], Option[String], 
> Option[String]] {
> +  override def zero: Option[String] = None
> +  override def reduce(b: Option[String], a: Option[String]): Option[String] 
> = merge(b, a)
> +  override def finish(reduction: Option[String]): Option[String] = reduction
> +  override def merge(b1: Option[String], b2: Option[String]): Option[String] 
> =
> +    b1.map{ b1v => b2.map{ b2v => b1v ++ b2v }.getOrElse(b1v) }.orElse(b2)
> +  override def bufferEncoder: Encoder[Option[String]] = ExpressionEncoder()
> +  override def outputEncoder: Encoder[Option[String]] = ExpressionEncoder()
> +}
> +
>  class DatasetAggregatorSuite extends QueryTest with SharedSparkSession {
>    import testImplicits._
>  
> @@ -432,4 +442,15 @@ class DatasetAggregatorSuite extends QueryTest with 
> SharedSparkSession {
>      val agg = df.select(mode(col("a"))).as[String]
>      checkDataset(agg, "3")
>    }
> +
> +  test("typed aggregation: option string") {
> +    val ds = Seq((1, Some("a")), (1, None), (1, Some("c")), (2, None)).toDS()
> +
> +    checkDataset(
> +      ds.groupByKey(_._1).mapValues(_._2).agg(
> +        OptionStringAgg.toColumn
> +      ),
> +      (1, Some("ac")), (2, None)
> +    )
> +  }
>  }
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43438) Fix mismatched column list error on INSERT

2023-07-06 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740789#comment-17740789
 ] 

Serge Rielau commented on SPARK-43438:
--

spark-sql (default)> INSERT INTO tabtest SELECT 1;
This should NOT succeed.

> Fix mismatched column list error on INSERT
> --
>
> Key: SPARK-43438
> URL: https://issues.apache.org/jira/browse/SPARK-43438
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> This error message is pretty bad, and common
> "_LEGACY_ERROR_TEMP_1038" : {
> "message" : [
> "Cannot write to table due to mismatched user specified column 
> size() and data column size()."
> ]
> },
> It can perhaps be merged with this one - after giving it an ERROR_CLASS
> "_LEGACY_ERROR_TEMP_1168" : {
> "message" : [
> " requires that the data to be inserted have the same number of 
> columns as the target table: target table has  column(s) but 
> the inserted data has  column(s), including  
> partition column(s) having constant value(s)."
> ]
> },
> Repro:
> CREATE TABLE tabtest(c1 INT, c2 INT);
> INSERT INTO tabtest SELECT 1;
> `spark_catalog`.`default`.`tabtest` requires that the data to be inserted 
> have the same number of columns as the target table: target table has 2 
> column(s) but the inserted data has 1 column(s), including 0 partition 
> column(s) having constant value(s).
> INSERT INTO tabtest(c1) SELECT 1, 2, 3;
> Cannot write to table due to mismatched user specified column size(1) and 
> data column size(3).; line 1 pos 24
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-43438) Fix mismatched column list error on INSERT

2023-07-06 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740789#comment-17740789
 ] 

Serge Rielau edited comment on SPARK-43438 at 7/6/23 8:17 PM:
--

spark-sql (default)> INSERT INTO tabtest SELECT 1;
This should NOT succeed.


was (Author: JIRAUSER288374):
spark-sql (default)> INSERT INTO tabtest SELECT 1;
This should NOT succeed.

> Fix mismatched column list error on INSERT
> --
>
> Key: SPARK-43438
> URL: https://issues.apache.org/jira/browse/SPARK-43438
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> This error message is pretty bad, and common
> "_LEGACY_ERROR_TEMP_1038" : {
> "message" : [
> "Cannot write to table due to mismatched user specified column 
> size() and data column size()."
> ]
> },
> It can perhaps be merged with this one - after giving it an ERROR_CLASS
> "_LEGACY_ERROR_TEMP_1168" : {
> "message" : [
> " requires that the data to be inserted have the same number of 
> columns as the target table: target table has  column(s) but 
> the inserted data has  column(s), including  
> partition column(s) having constant value(s)."
> ]
> },
> Repro:
> CREATE TABLE tabtest(c1 INT, c2 INT);
> INSERT INTO tabtest SELECT 1;
> `spark_catalog`.`default`.`tabtest` requires that the data to be inserted 
> have the same number of columns as the target table: target table has 2 
> column(s) but the inserted data has 1 column(s), including 0 partition 
> column(s) having constant value(s).
> INSERT INTO tabtest(c1) SELECT 1, 2, 3;
> Cannot write to table due to mismatched user specified column size(1) and 
> data column size(3).; line 1 pos 24
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43321) Impl Dataset#JoinWith

2023-07-06 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-43321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-43321.
---
Fix Version/s: 3.5.0
 Assignee: Zhen Li
   Resolution: Fixed

> Impl Dataset#JoinWith
> -
>
> Key: SPARK-43321
> URL: https://issues.apache.org/jira/browse/SPARK-43321
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.5.0
>
>
> Impl missing method JoinWith



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44325) Define the computing logic through PartitionEvaluator API and use it in SortMergeJoinExec

2023-07-06 Thread Vinod KC (Jira)
Vinod KC created SPARK-44325:


 Summary: Define the computing logic through PartitionEvaluator API 
and use it in SortMergeJoinExec
 Key: SPARK-44325
 URL: https://issues.apache.org/jira/browse/SPARK-44325
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Vinod KC


Define the computing logic through PartitionEvaluator API and use it in 
SortMergeJoinExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44325) Define the computing logic through PartitionEvaluator API and use it in SortMergeJoinExec

2023-07-06 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740836#comment-17740836
 ] 

ci-cassandra.apache.org commented on SPARK-44325:
-

User 'vinodkc' has created a pull request for this issue:
https://github.com/apache/spark/pull/41884

> Define the computing logic through PartitionEvaluator API and use it in 
> SortMergeJoinExec
> -
>
> Key: SPARK-44325
> URL: https://issues.apache.org/jira/browse/SPARK-44325
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in 
> SortMergeJoinExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44315) Move DefinedByConstructorParams to sql/api

2023-07-06 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-44315.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41873
[https://github.com/apache/spark/pull/41873]

> Move DefinedByConstructorParams to sql/api
> --
>
> Key: SPARK-44315
> URL: https://issues.apache.org/jira/browse/SPARK-44315
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44326) Move utils that are used from Scala client to the common modules

2023-07-06 Thread Rui Wang (Jira)
Rui Wang created SPARK-44326:


 Summary: Move utils that are used from Scala client to the common 
modules
 Key: SPARK-44326
 URL: https://issues.apache.org/jira/browse/SPARK-44326
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, SQL
Affects Versions: 3.5.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43660) Enable `resample` with Spark Connect

2023-07-06 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43660:
-

Assignee: Haejoon Lee

> Enable `resample` with Spark Connect
> 
>
> Key: SPARK-43660
> URL: https://issues.apache.org/jira/browse/SPARK-43660
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Enable `resample` with Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43660) Enable `resample` with Spark Connect

2023-07-06 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43660.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41877
[https://github.com/apache/spark/pull/41877]

> Enable `resample` with Spark Connect
> 
>
> Key: SPARK-43660
> URL: https://issues.apache.org/jira/browse/SPARK-43660
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> Enable `resample` with Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44327) Add functions any and len to Scala

2023-07-06 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-44327:
-

 Summary: Add functions any and len to Scala
 Key: SPARK-44327
 URL: https://issues.apache.org/jira/browse/SPARK-44327
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44328) Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2329]

2023-07-06 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44328:
--

 Summary: Assign names to the error class 
_LEGACY_ERROR_TEMP_[2325-2329]
 Key: SPARK-44328
 URL: https://issues.apache.org/jira/browse/SPARK-44328
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44275) Support client-side retries in Spark Connect Scala client

2023-07-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-44275:
-
Affects Version/s: 3.5.0
   (was: 3.4.1)

> Support client-side retries in Spark Connect Scala client
> -
>
> Key: SPARK-44275
> URL: https://issues.apache.org/jira/browse/SPARK-44275
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Robert Dillitz
>Priority: Major
>
> Add a configurable retry mechanism to the Scala Connect client similar to the 
> one in the Python client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44275) Support client-side retries in Spark Connect Scala client

2023-07-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44275.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41829
[https://github.com/apache/spark/pull/41829]

> Support client-side retries in Spark Connect Scala client
> -
>
> Key: SPARK-44275
> URL: https://issues.apache.org/jira/browse/SPARK-44275
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Robert Dillitz
>Assignee: Robert Dillitz
>Priority: Major
> Fix For: 3.5.0
>
>
> Add a configurable retry mechanism to the Scala Connect client similar to the 
> one in the Python client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44275) Support client-side retries in Spark Connect Scala client

2023-07-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44275:


Assignee: Robert Dillitz

> Support client-side retries in Spark Connect Scala client
> -
>
> Key: SPARK-44275
> URL: https://issues.apache.org/jira/browse/SPARK-44275
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Robert Dillitz
>Assignee: Robert Dillitz
>Priority: Major
>
> Add a configurable retry mechanism to the Scala Connect client similar to the 
> one in the Python client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44312) [PYTHON][CONNECT] Use SPARK_CONNECT_USER_AGENT environment variable for the user agent

2023-07-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44312.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41866
[https://github.com/apache/spark/pull/41866]

> [PYTHON][CONNECT] Use SPARK_CONNECT_USER_AGENT environment variable for the 
> user agent
> --
>
> Key: SPARK-44312
> URL: https://issues.apache.org/jira/browse/SPARK-44312
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Robert Dillitz
>Assignee: Robert Dillitz
>Priority: Major
> Fix For: 3.5.0
>
>
> Allow us to prepend a Spark Connect user agent with an environment variable: 
> *SPARK_CONNECT_USER_AGENT*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44312) [PYTHON][CONNECT] Use SPARK_CONNECT_USER_AGENT environment variable for the user agent

2023-07-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44312:


Assignee: Robert Dillitz

> [PYTHON][CONNECT] Use SPARK_CONNECT_USER_AGENT environment variable for the 
> user agent
> --
>
> Key: SPARK-44312
> URL: https://issues.apache.org/jira/browse/SPARK-44312
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Robert Dillitz
>Assignee: Robert Dillitz
>Priority: Major
>
> Allow us to prepend a Spark Connect user agent with an environment variable: 
> *SPARK_CONNECT_USER_AGENT*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44329) Add hll_sketch_agg, hll_union_agg, to_varchar, try_aes_decrypt to Scala and Python

2023-07-06 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-44329:
-

 Summary: Add hll_sketch_agg, hll_union_agg, to_varchar, 
try_aes_decrypt to Scala and Python
 Key: SPARK-44329
 URL: https://issues.apache.org/jira/browse/SPARK-44329
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, SQL
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44330) Define the computing logic through PartitionEvaluator API and use it in BroadcastNestedLoopJoinExec & BroadcastHashJoinExec

2023-07-06 Thread Vinod KC (Jira)
Vinod KC created SPARK-44330:


 Summary: Define the computing logic through PartitionEvaluator API 
and use it in BroadcastNestedLoopJoinExec & BroadcastHashJoinExec
 Key: SPARK-44330
 URL: https://issues.apache.org/jira/browse/SPARK-44330
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Vinod KC


Define the computing logic through PartitionEvaluator API and use it in 
BroadcastNestedLoopJoinExec & BroadcastHashJoinExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44331) Add bitmap functions to Scala and Python

2023-07-06 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-44331:
-

 Summary: Add bitmap functions to Scala and Python
 Key: SPARK-44331
 URL: https://issues.apache.org/jira/browse/SPARK-44331
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng


* bitmap_bucket_number
* bitmap_bit_position
* bitmap_construct_agg
* bitmap_count
* bitmap_or_agg



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44327) Add functions any and len to Scala

2023-07-06 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-44327:
-

Assignee: Ruifeng Zheng

> Add functions any and len to Scala
> --
>
> Key: SPARK-44327
> URL: https://issues.apache.org/jira/browse/SPARK-44327
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44328) Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]

2023-07-06 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-44328:
---
Summary: Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]  
(was: Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2329])

> Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]
> --
>
> Key: SPARK-44328
> URL: https://issues.apache.org/jira/browse/SPARK-44328
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44332) The Executor ID should start with 1 when running on Spark cluster of [N, cores, memory] locally mode

2023-07-06 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44332:
---

 Summary: The Executor ID should start with 1 when running on Spark 
cluster of [N, cores, memory] locally mode
 Key: SPARK-44332
 URL: https://issues.apache.org/jira/browse/SPARK-44332
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Web UI
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44299) Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8]

2023-07-06 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-44299:


Assignee: BingKun Pan

> Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8]
> -
>
> Key: SPARK-44299
> URL: https://issues.apache.org/jira/browse/SPARK-44299
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44299) Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8]

2023-07-06 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-44299.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41858
[https://github.com/apache/spark/pull/41858]

> Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8]
> -
>
> Key: SPARK-44299
> URL: https://issues.apache.org/jira/browse/SPARK-44299
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44319) Migrate jersey 2 to jersey 3

2023-07-06 Thread Yang Jie (Jira)
Yang Jie created SPARK-44319:


 Summary: Migrate jersey 2 to jersey 3
 Key: SPARK-44319
 URL: https://issues.apache.org/jira/browse/SPARK-44319
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44319) Migrate jersey 2 to jersey 3

2023-07-06 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740463#comment-17740463
 ] 

BingKun Pan commented on SPARK-44319:
-

👍🏻

> Migrate jersey 2 to jersey 3
> 
>
> Key: SPARK-44319
> URL: https://issues.apache.org/jira/browse/SPARK-44319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44320) Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277

2023-07-06 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44320:
---

 Summary: Assign names to the error class 
_LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277
 Key: SPARK-44320
 URL: https://issues.apache.org/jira/browse/SPARK-44320
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44320) Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277]

2023-07-06 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-44320:

Summary: Assign names to the error class 
_LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277]  (was: Assign names to the error 
class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277)

> Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277]
> -
>
> Key: SPARK-44320
> URL: https://issues.apache.org/jira/browse/SPARK-44320
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44307) Bloom filter is not added for left outer join if the left side table is smaller than broadcast threshold.

2023-07-06 Thread Nikita Awasthi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740565#comment-17740565
 ] 

Nikita Awasthi commented on SPARK-44307:


User 'maheshk114' has created a pull request for this issue:
https://github.com/apache/spark/pull/41860

> Bloom filter is not added for left outer join if the left side table is 
> smaller than broadcast threshold.
> -
>
> Key: SPARK-44307
> URL: https://issues.apache.org/jira/browse/SPARK-44307
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.4.1
>Reporter: mahesh kumar behera
>Priority: Major
> Fix For: 3.5.0
>
>
> In case of left outer join, even if the left side table is small enough to be 
> broadcasted, shuffle join is used. This is because of the property of the 
> left outer join. If the left side is broadcasted in left outer join, the 
> result generated will be wrong. But this is not taken care of in bloom 
> filter. While injecting the bloom filter, if lest side is smaller than 
> broadcast threshold, bloom filter is not added. It assumes that the left side 
> will be broadcast and there is no need for a bloom filter. This causes bloom 
> filter optimization to be missed in case of left outer join with small left 
> side and huge right-side table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44275) Support client-side retries in Spark Connect Scala client

2023-07-06 Thread Nikita Awasthi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740568#comment-17740568
 ] 

Nikita Awasthi commented on SPARK-44275:


User 'dillitz' has created a pull request for this issue:
https://github.com/apache/spark/pull/41829

> Support client-side retries in Spark Connect Scala client
> -
>
> Key: SPARK-44275
> URL: https://issues.apache.org/jira/browse/SPARK-44275
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Robert Dillitz
>Priority: Major
>
> Add a configurable retry mechanism to the Scala Connect client similar to the 
> one in the Python client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44284) Introduce simpe conf system for sql/api

2023-07-06 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44284.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Introduce simpe conf system for sql/api
> ---
>
> Key: SPARK-44284
> URL: https://issues.apache.org/jira/browse/SPARK-44284
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>
> Create a simple conf system for classes in sql/api



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44303) Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324]

2023-07-06 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-44303.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41863
[https://github.com/apache/spark/pull/41863]

> Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324]
> --
>
> Key: SPARK-44303
> URL: https://issues.apache.org/jira/browse/SPARK-44303
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44303) Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324]

2023-07-06 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-44303:


Assignee: jiaan.geng

> Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324]
> --
>
> Key: SPARK-44303
> URL: https://issues.apache.org/jira/browse/SPARK-44303
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44321) Move ParseException to SQL/API

2023-07-06 Thread Jira
Herman van Hövell created SPARK-44321:
-

 Summary: Move ParseException to SQL/API
 Key: SPARK-44321
 URL: https://issues.apache.org/jira/browse/SPARK-44321
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SQL
Affects Versions: 3.5.0
Reporter: Herman van Hövell
Assignee: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44283) Move Origin to api

2023-07-06 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44283.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Move Origin to api
> --
>
> Key: SPARK-44283
> URL: https://issues.apache.org/jira/browse/SPARK-44283
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44322) Make parser use SqlApiConf

2023-07-06 Thread Jira
Herman van Hövell created SPARK-44322:
-

 Summary: Make parser use SqlApiConf
 Key: SPARK-44322
 URL: https://issues.apache.org/jira/browse/SPARK-44322
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.4.1
Reporter: Herman van Hövell
Assignee: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44314) Add a new checkstyle rule to prohibit the use of `@Test(expected = SomeException.class)`

2023-07-06 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-44314.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41872
[https://github.com/apache/spark/pull/41872]

> Add a new checkstyle rule to prohibit the use of `@Test(expected = 
> SomeException.class)`
> 
>
> Key: SPARK-44314
> URL: https://issues.apache.org/jira/browse/SPARK-44314
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.5.0
>
>
> [https://github.com/junit-team/junit4/wiki/Exception-testing]
>  
> {code:java}
> The expected parameter should be used with care. The above test will pass if 
> any code in the method throws IndexOutOfBoundsException. Using the method you 
> also cannot test the value of the message in the exception, or the state of a 
> domain object after the exception has been thrown.For these reasons, the 
> previous approaches are recommended. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44314) Add a new checkstyle rule to prohibit the use of `@Test(expected = SomeException.class)`

2023-07-06 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-44314:


Assignee: Yang Jie

> Add a new checkstyle rule to prohibit the use of `@Test(expected = 
> SomeException.class)`
> 
>
> Key: SPARK-44314
> URL: https://issues.apache.org/jira/browse/SPARK-44314
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> [https://github.com/junit-team/junit4/wiki/Exception-testing]
>  
> {code:java}
> The expected parameter should be used with care. The above test will pass if 
> any code in the method throws IndexOutOfBoundsException. Using the method you 
> also cannot test the value of the message in the exception, or the state of a 
> domain object after the exception has been thrown.For these reasons, the 
> previous approaches are recommended. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44316) Upgrade Jersey to 2.40

2023-07-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44316:
-

Assignee: BingKun Pan

> Upgrade Jersey to 2.40
> --
>
> Key: SPARK-44316
> URL: https://issues.apache.org/jira/browse/SPARK-44316
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44316) Upgrade Jersey to 2.40

2023-07-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44316.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41874
[https://github.com/apache/spark/pull/41874]

> Upgrade Jersey to 2.40
> --
>
> Key: SPARK-44316
> URL: https://issues.apache.org/jira/browse/SPARK-44316
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44316) Upgrade Jersey to 2.40

2023-07-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44316:
--
Parent: SPARK-43831
Issue Type: Sub-task  (was: Improvement)

> Upgrade Jersey to 2.40
> --
>
> Key: SPARK-44316
> URL: https://issues.apache.org/jira/browse/SPARK-44316
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44323) Scala None shows up as null for Aggregator BUF or OUT

2023-07-06 Thread koert kuipers (Jira)
koert kuipers created SPARK-44323:
-

 Summary: Scala None shows up as null for Aggregator BUF or OUT  
 Key: SPARK-44323
 URL: https://issues.apache.org/jira/browse/SPARK-44323
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.1
Reporter: koert kuipers


when doing an upgrade from spark 3.3.1 to spark 3.4.1 we suddenly started 
getting null pointer exceptions in Aggregators (classes extending 
org.apache.spark.sql.expressions.Aggregator) that use scala Option for BUF 
and/or OUT. basically None is now showing up as null.

after adding a simple test case and doing a binary search on commits we landed 
on SPARK-37829 being the cause.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44323) Scala None shows up as null for Aggregator BUF or OUT

2023-07-06 Thread koert kuipers (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

koert kuipers updated SPARK-44323:
--
Description: 
when doing an upgrade from spark 3.3.1 to spark 3.4.1 we suddenly started 
getting null pointer exceptions in Aggregators (classes extending 
org.apache.spark.sql.expressions.Aggregator) that use scala Option for BUF 
and/or OUT. basically None is now showing up as null.

after adding a simple test case and doing a binary search on commits we landed 
on SPARK-37829 being the cause.

we observed the issue at first with NPE inside Aggregator.merge because None 
was null. i am having a hard time replicating that in a spark unit test, but i 
did manage to get a None become null in the output. simple test that now fails:

 
{code:java}
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
index e9daa825dd4..a1959d7065d 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
@@ -228,6 +228,16 @@ case class FooAgg(s: Int) extends Aggregator[Row, Int, 
Int] {
   def outputEncoder: Encoder[Int] = Encoders.scalaInt
 }
 
+object OptionStringAgg extends Aggregator[Option[String], Option[String], 
Option[String]] {
+  override def zero: Option[String] = None
+  override def reduce(b: Option[String], a: Option[String]): Option[String] = 
merge(b, a)
+  override def finish(reduction: Option[String]): Option[String] = reduction
+  override def merge(b1: Option[String], b2: Option[String]): Option[String] =
+    b1.map{ b1v => b2.map{ b2v => b1v ++ b2v }.getOrElse(b1v) }.orElse(b2)
+  override def bufferEncoder: Encoder[Option[String]] = ExpressionEncoder()
+  override def outputEncoder: Encoder[Option[String]] = ExpressionEncoder()
+}
+
 class DatasetAggregatorSuite extends QueryTest with SharedSparkSession {
   import testImplicits._
 
@@ -432,4 +442,15 @@ class DatasetAggregatorSuite extends QueryTest with 
SharedSparkSession {
     val agg = df.select(mode(col("a"))).as[String]
     checkDataset(agg, "3")
   }
+
+  test("typed aggregation: option string") {
+    val ds = Seq((1, Some("a")), (1, None), (1, Some("c")), (2, None)).toDS()
+
+    checkDataset(
+      ds.groupByKey(_._1).mapValues(_._2).agg(
+        OptionStringAgg.toColumn
+      ),
+      (1, Some("ac")), (2, None)
+    )
+  }
 }
 {code}

  was:
when doing an upgrade from spark 3.3.1 to spark 3.4.1 we suddenly started 
getting null pointer exceptions in Aggregators (classes extending 
org.apache.spark.sql.expressions.Aggregator) that use scala Option for BUF 
and/or OUT. basically None is now showing up as null.

after adding a simple test case and doing a binary search on commits we landed 
on SPARK-37829 being the cause.


> Scala None shows up as null for Aggregator BUF or OUT  
> ---
>
> Key: SPARK-44323
> URL: https://issues.apache.org/jira/browse/SPARK-44323
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: koert kuipers
>Priority: Major
>
> when doing an upgrade from spark 3.3.1 to spark 3.4.1 we suddenly started 
> getting null pointer exceptions in Aggregators (classes extending 
> org.apache.spark.sql.expressions.Aggregator) that use scala Option for BUF 
> and/or OUT. basically None is now showing up as null.
> after adding a simple test case and doing a binary search on commits we 
> landed on SPARK-37829 being the cause.
> we observed the issue at first with NPE inside Aggregator.merge because None 
> was null. i am having a hard time replicating that in a spark unit test, but 
> i did manage to get a None become null in the output. simple test that now 
> fails:
>  
> {code:java}
> diff --git 
> a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala 
> b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
> index e9daa825dd4..a1959d7065d 100644
> --- 
> a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
> +++ 
> b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
> @@ -228,6 +228,16 @@ case class FooAgg(s: Int) extends Aggregator[Row, Int, 
> Int] {
>    def outputEncoder: Encoder[Int] = Encoders.scalaInt
>  }
>  
> +object OptionStringAgg extends Aggregator[Option[String], Option[String], 
> Option[String]] {
> +  override def zero: Option[String] = None
> +  override def reduce(b: Option[String], a: Option[String]): Option[String] 
> = merge(b, a)
> +  override def finish(reduction: Option[String]): Option[String] = reduction
> +  override def merge(b1: Option[String], b2: Option[String]): Option[String] 
> =
> +    b1.map{ b1v => b2.map{ b