[jira] [Commented] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
[ https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740726#comment-17740726 ] koert kuipers commented on SPARK-37829: --- since this behavior of returning a Row with null values has been present since spark 3.0.x (a major breaking release, and 3 years ago) i would argue this is the default behavior and this jira introduces a breaking change. > An outer-join using joinWith on DataFrames returns Rows with null fields > instead of null values > --- > > Key: SPARK-37829 > URL: https://issues.apache.org/jira/browse/SPARK-37829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: Clément de Groc >Assignee: Jason Xu >Priority: Major > Fix For: 3.3.3, 3.4.1, 3.5.0 > > > Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return > missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with > {{null}} values in Spark 3+. > The issue can be reproduced with [the following > test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] > that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. > The problem only arises when working with DataFrames: Datasets of case > classes work as expected as demonstrated by [this other > test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. > I couldn't find an explanation for this change in the Migration guide so I'm > assuming this is a bug. > A {{git bisect}} pointed me to [that > commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. > Reverting the commit solves the problem. > A similar solution, but without reverting, is shown > [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a]. > Happy to help if you think of another approach / can provide some guidance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
[ https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740726#comment-17740726 ] koert kuipers edited comment on SPARK-37829 at 7/6/23 5:23 PM: --- since this behavior of returning a Row with null values has been present since spark 3.0.x (a major breaking release, and 3 years ago) i would argue this is the default behavior and this jira introduces a breaking change. also expressionencoders are used for other purposes than dataset joins and now we find nulls popping up in places they should not. was (Author: koert): since this behavior of returning a Row with null values has been present since spark 3.0.x (a major breaking release, and 3 years ago) i would argue this is the default behavior and this jira introduces a breaking change. > An outer-join using joinWith on DataFrames returns Rows with null fields > instead of null values > --- > > Key: SPARK-37829 > URL: https://issues.apache.org/jira/browse/SPARK-37829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: Clément de Groc >Assignee: Jason Xu >Priority: Major > Fix For: 3.3.3, 3.4.1, 3.5.0 > > > Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return > missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with > {{null}} values in Spark 3+. > The issue can be reproduced with [the following > test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] > that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. > The problem only arises when working with DataFrames: Datasets of case > classes work as expected as demonstrated by [this other > test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. > I couldn't find an explanation for this change in the Migration guide so I'm > assuming this is a bug. > A {{git bisect}} pointed me to [that > commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. > Reverting the commit solves the problem. > A similar solution, but without reverting, is shown > [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a]. > Happy to help if you think of another approach / can provide some guidance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
[ https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740726#comment-17740726 ] koert kuipers edited comment on SPARK-37829 at 7/6/23 5:32 PM: --- since this (admittedly somewhat weird) behavior of returning a Row with null values has been present since spark 3.0.x (a major breaking release, and 3 years ago) i would argue this is the default behavior and this jira introduces a breaking change. also expressionencoders are used for other purposes than dataset joins and now we find nulls popping up in places they should not. was (Author: koert): since this behavior of returning a Row with null values has been present since spark 3.0.x (a major breaking release, and 3 years ago) i would argue this is the default behavior and this jira introduces a breaking change. also expressionencoders are used for other purposes than dataset joins and now we find nulls popping up in places they should not. > An outer-join using joinWith on DataFrames returns Rows with null fields > instead of null values > --- > > Key: SPARK-37829 > URL: https://issues.apache.org/jira/browse/SPARK-37829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: Clément de Groc >Assignee: Jason Xu >Priority: Major > Fix For: 3.3.3, 3.4.1, 3.5.0 > > > Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return > missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with > {{null}} values in Spark 3+. > The issue can be reproduced with [the following > test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] > that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. > The problem only arises when working with DataFrames: Datasets of case > classes work as expected as demonstrated by [this other > test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. > I couldn't find an explanation for this change in the Migration guide so I'm > assuming this is a bug. > A {{git bisect}} pointed me to [that > commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. > Reverting the commit solves the problem. > A similar solution, but without reverting, is shown > [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a]. > Happy to help if you think of another approach / can provide some guidance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44324) Move CaseInsensitiveMap to sql/api
Rui Wang created SPARK-44324: Summary: Move CaseInsensitiveMap to sql/api Key: SPARK-44324 URL: https://issues.apache.org/jira/browse/SPARK-44324 Project: Spark Issue Type: Sub-task Components: Connect, SQL Affects Versions: 3.5.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-44323) Scala None shows up as null for Aggregator BUF or OUT
[ https://issues.apache.org/jira/browse/SPARK-44323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740744#comment-17740744 ] koert kuipers edited comment on SPARK-44323 at 7/6/23 6:28 PM: --- i think the issue is that Nones inside Tuples now become nulls. so its the usage of nullSafe inside the childrenDeserializers for tuples introduced in [https://github.com/apache/spark/pull/40755] was (Author: koert): i think the issue is that Nones inside Tuples now become null. so its the usage of nullSafe inside the childrenDeserializers for tuples introduced in https://github.com/apache/spark/pull/40755 > Scala None shows up as null for Aggregator BUF or OUT > --- > > Key: SPARK-44323 > URL: https://issues.apache.org/jira/browse/SPARK-44323 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1 >Reporter: koert kuipers >Priority: Major > > when doing an upgrade from spark 3.3.1 to spark 3.4.1 we suddenly started > getting null pointer exceptions in Aggregators (classes extending > org.apache.spark.sql.expressions.Aggregator) that use scala Option for BUF > and/or OUT. basically None is now showing up as null. > after adding a simple test case and doing a binary search on commits we > landed on SPARK-37829 being the cause. > we observed the issue at first with NPE inside Aggregator.merge because None > was null. i am having a hard time replicating that in a spark unit test, but > i did manage to get a None become null in the output. simple test that now > fails: > > {code:java} > diff --git > a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala > b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala > index e9daa825dd4..a1959d7065d 100644 > --- > a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala > +++ > b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala > @@ -228,6 +228,16 @@ case class FooAgg(s: Int) extends Aggregator[Row, Int, > Int] { > def outputEncoder: Encoder[Int] = Encoders.scalaInt > } > > +object OptionStringAgg extends Aggregator[Option[String], Option[String], > Option[String]] { > + override def zero: Option[String] = None > + override def reduce(b: Option[String], a: Option[String]): Option[String] > = merge(b, a) > + override def finish(reduction: Option[String]): Option[String] = reduction > + override def merge(b1: Option[String], b2: Option[String]): Option[String] > = > + b1.map{ b1v => b2.map{ b2v => b1v ++ b2v }.getOrElse(b1v) }.orElse(b2) > + override def bufferEncoder: Encoder[Option[String]] = ExpressionEncoder() > + override def outputEncoder: Encoder[Option[String]] = ExpressionEncoder() > +} > + > class DatasetAggregatorSuite extends QueryTest with SharedSparkSession { > import testImplicits._ > > @@ -432,4 +442,15 @@ class DatasetAggregatorSuite extends QueryTest with > SharedSparkSession { > val agg = df.select(mode(col("a"))).as[String] > checkDataset(agg, "3") > } > + > + test("typed aggregation: option string") { > + val ds = Seq((1, Some("a")), (1, None), (1, Some("c")), (2, None)).toDS() > + > + checkDataset( > + ds.groupByKey(_._1).mapValues(_._2).agg( > + OptionStringAgg.toColumn > + ), > + (1, Some("ac")), (2, None) > + ) > + } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44323) Scala None shows up as null for Aggregator BUF or OUT
[ https://issues.apache.org/jira/browse/SPARK-44323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740744#comment-17740744 ] koert kuipers commented on SPARK-44323: --- i think the issue is that Nones inside Tuples now become null. so its the usage of nullSafe inside the childrenDeserializers for tuples introduced in https://github.com/apache/spark/pull/40755 > Scala None shows up as null for Aggregator BUF or OUT > --- > > Key: SPARK-44323 > URL: https://issues.apache.org/jira/browse/SPARK-44323 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1 >Reporter: koert kuipers >Priority: Major > > when doing an upgrade from spark 3.3.1 to spark 3.4.1 we suddenly started > getting null pointer exceptions in Aggregators (classes extending > org.apache.spark.sql.expressions.Aggregator) that use scala Option for BUF > and/or OUT. basically None is now showing up as null. > after adding a simple test case and doing a binary search on commits we > landed on SPARK-37829 being the cause. > we observed the issue at first with NPE inside Aggregator.merge because None > was null. i am having a hard time replicating that in a spark unit test, but > i did manage to get a None become null in the output. simple test that now > fails: > > {code:java} > diff --git > a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala > b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala > index e9daa825dd4..a1959d7065d 100644 > --- > a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala > +++ > b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala > @@ -228,6 +228,16 @@ case class FooAgg(s: Int) extends Aggregator[Row, Int, > Int] { > def outputEncoder: Encoder[Int] = Encoders.scalaInt > } > > +object OptionStringAgg extends Aggregator[Option[String], Option[String], > Option[String]] { > + override def zero: Option[String] = None > + override def reduce(b: Option[String], a: Option[String]): Option[String] > = merge(b, a) > + override def finish(reduction: Option[String]): Option[String] = reduction > + override def merge(b1: Option[String], b2: Option[String]): Option[String] > = > + b1.map{ b1v => b2.map{ b2v => b1v ++ b2v }.getOrElse(b1v) }.orElse(b2) > + override def bufferEncoder: Encoder[Option[String]] = ExpressionEncoder() > + override def outputEncoder: Encoder[Option[String]] = ExpressionEncoder() > +} > + > class DatasetAggregatorSuite extends QueryTest with SharedSparkSession { > import testImplicits._ > > @@ -432,4 +442,15 @@ class DatasetAggregatorSuite extends QueryTest with > SharedSparkSession { > val agg = df.select(mode(col("a"))).as[String] > checkDataset(agg, "3") > } > + > + test("typed aggregation: option string") { > + val ds = Seq((1, Some("a")), (1, None), (1, Some("c")), (2, None)).toDS() > + > + checkDataset( > + ds.groupByKey(_._1).mapValues(_._2).agg( > + OptionStringAgg.toColumn > + ), > + (1, Some("ac")), (2, None) > + ) > + } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43438) Fix mismatched column list error on INSERT
[ https://issues.apache.org/jira/browse/SPARK-43438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740789#comment-17740789 ] Serge Rielau commented on SPARK-43438: -- spark-sql (default)> INSERT INTO tabtest SELECT 1; This should NOT succeed. > Fix mismatched column list error on INSERT > -- > > Key: SPARK-43438 > URL: https://issues.apache.org/jira/browse/SPARK-43438 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > This error message is pretty bad, and common > "_LEGACY_ERROR_TEMP_1038" : { > "message" : [ > "Cannot write to table due to mismatched user specified column > size() and data column size()." > ] > }, > It can perhaps be merged with this one - after giving it an ERROR_CLASS > "_LEGACY_ERROR_TEMP_1168" : { > "message" : [ > " requires that the data to be inserted have the same number of > columns as the target table: target table has column(s) but > the inserted data has column(s), including > partition column(s) having constant value(s)." > ] > }, > Repro: > CREATE TABLE tabtest(c1 INT, c2 INT); > INSERT INTO tabtest SELECT 1; > `spark_catalog`.`default`.`tabtest` requires that the data to be inserted > have the same number of columns as the target table: target table has 2 > column(s) but the inserted data has 1 column(s), including 0 partition > column(s) having constant value(s). > INSERT INTO tabtest(c1) SELECT 1, 2, 3; > Cannot write to table due to mismatched user specified column size(1) and > data column size(3).; line 1 pos 24 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43438) Fix mismatched column list error on INSERT
[ https://issues.apache.org/jira/browse/SPARK-43438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740789#comment-17740789 ] Serge Rielau edited comment on SPARK-43438 at 7/6/23 8:17 PM: -- spark-sql (default)> INSERT INTO tabtest SELECT 1; This should NOT succeed. was (Author: JIRAUSER288374): spark-sql (default)> INSERT INTO tabtest SELECT 1; This should NOT succeed. > Fix mismatched column list error on INSERT > -- > > Key: SPARK-43438 > URL: https://issues.apache.org/jira/browse/SPARK-43438 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > This error message is pretty bad, and common > "_LEGACY_ERROR_TEMP_1038" : { > "message" : [ > "Cannot write to table due to mismatched user specified column > size() and data column size()." > ] > }, > It can perhaps be merged with this one - after giving it an ERROR_CLASS > "_LEGACY_ERROR_TEMP_1168" : { > "message" : [ > " requires that the data to be inserted have the same number of > columns as the target table: target table has column(s) but > the inserted data has column(s), including > partition column(s) having constant value(s)." > ] > }, > Repro: > CREATE TABLE tabtest(c1 INT, c2 INT); > INSERT INTO tabtest SELECT 1; > `spark_catalog`.`default`.`tabtest` requires that the data to be inserted > have the same number of columns as the target table: target table has 2 > column(s) but the inserted data has 1 column(s), including 0 partition > column(s) having constant value(s). > INSERT INTO tabtest(c1) SELECT 1, 2, 3; > Cannot write to table due to mismatched user specified column size(1) and > data column size(3).; line 1 pos 24 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43321) Impl Dataset#JoinWith
[ https://issues.apache.org/jira/browse/SPARK-43321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-43321. --- Fix Version/s: 3.5.0 Assignee: Zhen Li Resolution: Fixed > Impl Dataset#JoinWith > - > > Key: SPARK-43321 > URL: https://issues.apache.org/jira/browse/SPARK-43321 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Fix For: 3.5.0 > > > Impl missing method JoinWith -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44325) Define the computing logic through PartitionEvaluator API and use it in SortMergeJoinExec
Vinod KC created SPARK-44325: Summary: Define the computing logic through PartitionEvaluator API and use it in SortMergeJoinExec Key: SPARK-44325 URL: https://issues.apache.org/jira/browse/SPARK-44325 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Vinod KC Define the computing logic through PartitionEvaluator API and use it in SortMergeJoinExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44325) Define the computing logic through PartitionEvaluator API and use it in SortMergeJoinExec
[ https://issues.apache.org/jira/browse/SPARK-44325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740836#comment-17740836 ] ci-cassandra.apache.org commented on SPARK-44325: - User 'vinodkc' has created a pull request for this issue: https://github.com/apache/spark/pull/41884 > Define the computing logic through PartitionEvaluator API and use it in > SortMergeJoinExec > - > > Key: SPARK-44325 > URL: https://issues.apache.org/jira/browse/SPARK-44325 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in > SortMergeJoinExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44315) Move DefinedByConstructorParams to sql/api
[ https://issues.apache.org/jira/browse/SPARK-44315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-44315. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41873 [https://github.com/apache/spark/pull/41873] > Move DefinedByConstructorParams to sql/api > -- > > Key: SPARK-44315 > URL: https://issues.apache.org/jira/browse/SPARK-44315 > Project: Spark > Issue Type: Sub-task > Components: Connect, SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44326) Move utils that are used from Scala client to the common modules
Rui Wang created SPARK-44326: Summary: Move utils that are used from Scala client to the common modules Key: SPARK-44326 URL: https://issues.apache.org/jira/browse/SPARK-44326 Project: Spark Issue Type: Sub-task Components: Connect, SQL Affects Versions: 3.5.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43660) Enable `resample` with Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43660: - Assignee: Haejoon Lee > Enable `resample` with Spark Connect > > > Key: SPARK-43660 > URL: https://issues.apache.org/jira/browse/SPARK-43660 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Enable `resample` with Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43660) Enable `resample` with Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43660. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41877 [https://github.com/apache/spark/pull/41877] > Enable `resample` with Spark Connect > > > Key: SPARK-43660 > URL: https://issues.apache.org/jira/browse/SPARK-43660 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > Enable `resample` with Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44327) Add functions any and len to Scala
Ruifeng Zheng created SPARK-44327: - Summary: Add functions any and len to Scala Key: SPARK-44327 URL: https://issues.apache.org/jira/browse/SPARK-44327 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.5.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44328) Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2329]
jiaan.geng created SPARK-44328: -- Summary: Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2329] Key: SPARK-44328 URL: https://issues.apache.org/jira/browse/SPARK-44328 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: jiaan.geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44275) Support client-side retries in Spark Connect Scala client
[ https://issues.apache.org/jira/browse/SPARK-44275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44275: - Affects Version/s: 3.5.0 (was: 3.4.1) > Support client-side retries in Spark Connect Scala client > - > > Key: SPARK-44275 > URL: https://issues.apache.org/jira/browse/SPARK-44275 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Robert Dillitz >Priority: Major > > Add a configurable retry mechanism to the Scala Connect client similar to the > one in the Python client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44275) Support client-side retries in Spark Connect Scala client
[ https://issues.apache.org/jira/browse/SPARK-44275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44275. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41829 [https://github.com/apache/spark/pull/41829] > Support client-side retries in Spark Connect Scala client > - > > Key: SPARK-44275 > URL: https://issues.apache.org/jira/browse/SPARK-44275 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Robert Dillitz >Assignee: Robert Dillitz >Priority: Major > Fix For: 3.5.0 > > > Add a configurable retry mechanism to the Scala Connect client similar to the > one in the Python client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44275) Support client-side retries in Spark Connect Scala client
[ https://issues.apache.org/jira/browse/SPARK-44275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44275: Assignee: Robert Dillitz > Support client-side retries in Spark Connect Scala client > - > > Key: SPARK-44275 > URL: https://issues.apache.org/jira/browse/SPARK-44275 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Robert Dillitz >Assignee: Robert Dillitz >Priority: Major > > Add a configurable retry mechanism to the Scala Connect client similar to the > one in the Python client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44312) [PYTHON][CONNECT] Use SPARK_CONNECT_USER_AGENT environment variable for the user agent
[ https://issues.apache.org/jira/browse/SPARK-44312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44312. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41866 [https://github.com/apache/spark/pull/41866] > [PYTHON][CONNECT] Use SPARK_CONNECT_USER_AGENT environment variable for the > user agent > -- > > Key: SPARK-44312 > URL: https://issues.apache.org/jira/browse/SPARK-44312 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.1 >Reporter: Robert Dillitz >Assignee: Robert Dillitz >Priority: Major > Fix For: 3.5.0 > > > Allow us to prepend a Spark Connect user agent with an environment variable: > *SPARK_CONNECT_USER_AGENT* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44312) [PYTHON][CONNECT] Use SPARK_CONNECT_USER_AGENT environment variable for the user agent
[ https://issues.apache.org/jira/browse/SPARK-44312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44312: Assignee: Robert Dillitz > [PYTHON][CONNECT] Use SPARK_CONNECT_USER_AGENT environment variable for the > user agent > -- > > Key: SPARK-44312 > URL: https://issues.apache.org/jira/browse/SPARK-44312 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.1 >Reporter: Robert Dillitz >Assignee: Robert Dillitz >Priority: Major > > Allow us to prepend a Spark Connect user agent with an environment variable: > *SPARK_CONNECT_USER_AGENT* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44329) Add hll_sketch_agg, hll_union_agg, to_varchar, try_aes_decrypt to Scala and Python
Ruifeng Zheng created SPARK-44329: - Summary: Add hll_sketch_agg, hll_union_agg, to_varchar, try_aes_decrypt to Scala and Python Key: SPARK-44329 URL: https://issues.apache.org/jira/browse/SPARK-44329 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, SQL Affects Versions: 3.5.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44330) Define the computing logic through PartitionEvaluator API and use it in BroadcastNestedLoopJoinExec & BroadcastHashJoinExec
Vinod KC created SPARK-44330: Summary: Define the computing logic through PartitionEvaluator API and use it in BroadcastNestedLoopJoinExec & BroadcastHashJoinExec Key: SPARK-44330 URL: https://issues.apache.org/jira/browse/SPARK-44330 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Vinod KC Define the computing logic through PartitionEvaluator API and use it in BroadcastNestedLoopJoinExec & BroadcastHashJoinExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44331) Add bitmap functions to Scala and Python
Ruifeng Zheng created SPARK-44331: - Summary: Add bitmap functions to Scala and Python Key: SPARK-44331 URL: https://issues.apache.org/jira/browse/SPARK-44331 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.5.0 Reporter: Ruifeng Zheng * bitmap_bucket_number * bitmap_bit_position * bitmap_construct_agg * bitmap_count * bitmap_or_agg -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44327) Add functions any and len to Scala
[ https://issues.apache.org/jira/browse/SPARK-44327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-44327: - Assignee: Ruifeng Zheng > Add functions any and len to Scala > -- > > Key: SPARK-44327 > URL: https://issues.apache.org/jira/browse/SPARK-44327 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44328) Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]
[ https://issues.apache.org/jira/browse/SPARK-44328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-44328: --- Summary: Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328] (was: Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2329]) > Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328] > -- > > Key: SPARK-44328 > URL: https://issues.apache.org/jira/browse/SPARK-44328 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44332) The Executor ID should start with 1 when running on Spark cluster of [N, cores, memory] locally mode
BingKun Pan created SPARK-44332: --- Summary: The Executor ID should start with 1 when running on Spark cluster of [N, cores, memory] locally mode Key: SPARK-44332 URL: https://issues.apache.org/jira/browse/SPARK-44332 Project: Spark Issue Type: Bug Components: Spark Core, Web UI Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44299) Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8]
[ https://issues.apache.org/jira/browse/SPARK-44299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-44299: Assignee: BingKun Pan > Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8] > - > > Key: SPARK-44299 > URL: https://issues.apache.org/jira/browse/SPARK-44299 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44299) Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8]
[ https://issues.apache.org/jira/browse/SPARK-44299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-44299. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41858 [https://github.com/apache/spark/pull/41858] > Assign names to the error class _LEGACY_ERROR_TEMP_227[4-6,8] > - > > Key: SPARK-44299 > URL: https://issues.apache.org/jira/browse/SPARK-44299 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44319) Migrate jersey 2 to jersey 3
Yang Jie created SPARK-44319: Summary: Migrate jersey 2 to jersey 3 Key: SPARK-44319 URL: https://issues.apache.org/jira/browse/SPARK-44319 Project: Spark Issue Type: Sub-task Components: Build, Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44319) Migrate jersey 2 to jersey 3
[ https://issues.apache.org/jira/browse/SPARK-44319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740463#comment-17740463 ] BingKun Pan commented on SPARK-44319: - 👍🏻 > Migrate jersey 2 to jersey 3 > > > Key: SPARK-44319 > URL: https://issues.apache.org/jira/browse/SPARK-44319 > Project: Spark > Issue Type: Sub-task > Components: Build, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44320) Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277
BingKun Pan created SPARK-44320: --- Summary: Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277 Key: SPARK-44320 URL: https://issues.apache.org/jira/browse/SPARK-44320 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44320) Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277]
[ https://issues.apache.org/jira/browse/SPARK-44320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-44320: Summary: Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277] (was: Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277) > Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277] > - > > Key: SPARK-44320 > URL: https://issues.apache.org/jira/browse/SPARK-44320 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44307) Bloom filter is not added for left outer join if the left side table is smaller than broadcast threshold.
[ https://issues.apache.org/jira/browse/SPARK-44307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740565#comment-17740565 ] Nikita Awasthi commented on SPARK-44307: User 'maheshk114' has created a pull request for this issue: https://github.com/apache/spark/pull/41860 > Bloom filter is not added for left outer join if the left side table is > smaller than broadcast threshold. > - > > Key: SPARK-44307 > URL: https://issues.apache.org/jira/browse/SPARK-44307 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.4.1 >Reporter: mahesh kumar behera >Priority: Major > Fix For: 3.5.0 > > > In case of left outer join, even if the left side table is small enough to be > broadcasted, shuffle join is used. This is because of the property of the > left outer join. If the left side is broadcasted in left outer join, the > result generated will be wrong. But this is not taken care of in bloom > filter. While injecting the bloom filter, if lest side is smaller than > broadcast threshold, bloom filter is not added. It assumes that the left side > will be broadcast and there is no need for a bloom filter. This causes bloom > filter optimization to be missed in case of left outer join with small left > side and huge right-side table. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44275) Support client-side retries in Spark Connect Scala client
[ https://issues.apache.org/jira/browse/SPARK-44275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740568#comment-17740568 ] Nikita Awasthi commented on SPARK-44275: User 'dillitz' has created a pull request for this issue: https://github.com/apache/spark/pull/41829 > Support client-side retries in Spark Connect Scala client > - > > Key: SPARK-44275 > URL: https://issues.apache.org/jira/browse/SPARK-44275 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.1 >Reporter: Robert Dillitz >Priority: Major > > Add a configurable retry mechanism to the Scala Connect client similar to the > one in the Python client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44284) Introduce simpe conf system for sql/api
[ https://issues.apache.org/jira/browse/SPARK-44284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44284. --- Fix Version/s: 3.5.0 Resolution: Fixed > Introduce simpe conf system for sql/api > --- > > Key: SPARK-44284 > URL: https://issues.apache.org/jira/browse/SPARK-44284 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.1 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Fix For: 3.5.0 > > > Create a simple conf system for classes in sql/api -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44303) Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324]
[ https://issues.apache.org/jira/browse/SPARK-44303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-44303. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41863 [https://github.com/apache/spark/pull/41863] > Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324] > -- > > Key: SPARK-44303 > URL: https://issues.apache.org/jira/browse/SPARK-44303 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44303) Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324]
[ https://issues.apache.org/jira/browse/SPARK-44303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-44303: Assignee: jiaan.geng > Assign names to the error class _LEGACY_ERROR_TEMP_[2320-2324] > -- > > Key: SPARK-44303 > URL: https://issues.apache.org/jira/browse/SPARK-44303 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44321) Move ParseException to SQL/API
Herman van Hövell created SPARK-44321: - Summary: Move ParseException to SQL/API Key: SPARK-44321 URL: https://issues.apache.org/jira/browse/SPARK-44321 Project: Spark Issue Type: New Feature Components: Connect, SQL Affects Versions: 3.5.0 Reporter: Herman van Hövell Assignee: Herman van Hövell -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44283) Move Origin to api
[ https://issues.apache.org/jira/browse/SPARK-44283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44283. --- Fix Version/s: 3.5.0 Resolution: Fixed > Move Origin to api > -- > > Key: SPARK-44283 > URL: https://issues.apache.org/jira/browse/SPARK-44283 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.1 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44322) Make parser use SqlApiConf
Herman van Hövell created SPARK-44322: - Summary: Make parser use SqlApiConf Key: SPARK-44322 URL: https://issues.apache.org/jira/browse/SPARK-44322 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.4.1 Reporter: Herman van Hövell Assignee: Herman van Hövell -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44314) Add a new checkstyle rule to prohibit the use of `@Test(expected = SomeException.class)`
[ https://issues.apache.org/jira/browse/SPARK-44314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-44314. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41872 [https://github.com/apache/spark/pull/41872] > Add a new checkstyle rule to prohibit the use of `@Test(expected = > SomeException.class)` > > > Key: SPARK-44314 > URL: https://issues.apache.org/jira/browse/SPARK-44314 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.5.0 > > > [https://github.com/junit-team/junit4/wiki/Exception-testing] > > {code:java} > The expected parameter should be used with care. The above test will pass if > any code in the method throws IndexOutOfBoundsException. Using the method you > also cannot test the value of the message in the exception, or the state of a > domain object after the exception has been thrown.For these reasons, the > previous approaches are recommended. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44314) Add a new checkstyle rule to prohibit the use of `@Test(expected = SomeException.class)`
[ https://issues.apache.org/jira/browse/SPARK-44314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-44314: Assignee: Yang Jie > Add a new checkstyle rule to prohibit the use of `@Test(expected = > SomeException.class)` > > > Key: SPARK-44314 > URL: https://issues.apache.org/jira/browse/SPARK-44314 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > [https://github.com/junit-team/junit4/wiki/Exception-testing] > > {code:java} > The expected parameter should be used with care. The above test will pass if > any code in the method throws IndexOutOfBoundsException. Using the method you > also cannot test the value of the message in the exception, or the state of a > domain object after the exception has been thrown.For these reasons, the > previous approaches are recommended. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44316) Upgrade Jersey to 2.40
[ https://issues.apache.org/jira/browse/SPARK-44316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44316: - Assignee: BingKun Pan > Upgrade Jersey to 2.40 > -- > > Key: SPARK-44316 > URL: https://issues.apache.org/jira/browse/SPARK-44316 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44316) Upgrade Jersey to 2.40
[ https://issues.apache.org/jira/browse/SPARK-44316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44316. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41874 [https://github.com/apache/spark/pull/41874] > Upgrade Jersey to 2.40 > -- > > Key: SPARK-44316 > URL: https://issues.apache.org/jira/browse/SPARK-44316 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44316) Upgrade Jersey to 2.40
[ https://issues.apache.org/jira/browse/SPARK-44316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44316: -- Parent: SPARK-43831 Issue Type: Sub-task (was: Improvement) > Upgrade Jersey to 2.40 > -- > > Key: SPARK-44316 > URL: https://issues.apache.org/jira/browse/SPARK-44316 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44323) Scala None shows up as null for Aggregator BUF or OUT
koert kuipers created SPARK-44323: - Summary: Scala None shows up as null for Aggregator BUF or OUT Key: SPARK-44323 URL: https://issues.apache.org/jira/browse/SPARK-44323 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.1 Reporter: koert kuipers when doing an upgrade from spark 3.3.1 to spark 3.4.1 we suddenly started getting null pointer exceptions in Aggregators (classes extending org.apache.spark.sql.expressions.Aggregator) that use scala Option for BUF and/or OUT. basically None is now showing up as null. after adding a simple test case and doing a binary search on commits we landed on SPARK-37829 being the cause. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44323) Scala None shows up as null for Aggregator BUF or OUT
[ https://issues.apache.org/jira/browse/SPARK-44323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] koert kuipers updated SPARK-44323: -- Description: when doing an upgrade from spark 3.3.1 to spark 3.4.1 we suddenly started getting null pointer exceptions in Aggregators (classes extending org.apache.spark.sql.expressions.Aggregator) that use scala Option for BUF and/or OUT. basically None is now showing up as null. after adding a simple test case and doing a binary search on commits we landed on SPARK-37829 being the cause. we observed the issue at first with NPE inside Aggregator.merge because None was null. i am having a hard time replicating that in a spark unit test, but i did manage to get a None become null in the output. simple test that now fails: {code:java} diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala index e9daa825dd4..a1959d7065d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala @@ -228,6 +228,16 @@ case class FooAgg(s: Int) extends Aggregator[Row, Int, Int] { def outputEncoder: Encoder[Int] = Encoders.scalaInt } +object OptionStringAgg extends Aggregator[Option[String], Option[String], Option[String]] { + override def zero: Option[String] = None + override def reduce(b: Option[String], a: Option[String]): Option[String] = merge(b, a) + override def finish(reduction: Option[String]): Option[String] = reduction + override def merge(b1: Option[String], b2: Option[String]): Option[String] = + b1.map{ b1v => b2.map{ b2v => b1v ++ b2v }.getOrElse(b1v) }.orElse(b2) + override def bufferEncoder: Encoder[Option[String]] = ExpressionEncoder() + override def outputEncoder: Encoder[Option[String]] = ExpressionEncoder() +} + class DatasetAggregatorSuite extends QueryTest with SharedSparkSession { import testImplicits._ @@ -432,4 +442,15 @@ class DatasetAggregatorSuite extends QueryTest with SharedSparkSession { val agg = df.select(mode(col("a"))).as[String] checkDataset(agg, "3") } + + test("typed aggregation: option string") { + val ds = Seq((1, Some("a")), (1, None), (1, Some("c")), (2, None)).toDS() + + checkDataset( + ds.groupByKey(_._1).mapValues(_._2).agg( + OptionStringAgg.toColumn + ), + (1, Some("ac")), (2, None) + ) + } } {code} was: when doing an upgrade from spark 3.3.1 to spark 3.4.1 we suddenly started getting null pointer exceptions in Aggregators (classes extending org.apache.spark.sql.expressions.Aggregator) that use scala Option for BUF and/or OUT. basically None is now showing up as null. after adding a simple test case and doing a binary search on commits we landed on SPARK-37829 being the cause. > Scala None shows up as null for Aggregator BUF or OUT > --- > > Key: SPARK-44323 > URL: https://issues.apache.org/jira/browse/SPARK-44323 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1 >Reporter: koert kuipers >Priority: Major > > when doing an upgrade from spark 3.3.1 to spark 3.4.1 we suddenly started > getting null pointer exceptions in Aggregators (classes extending > org.apache.spark.sql.expressions.Aggregator) that use scala Option for BUF > and/or OUT. basically None is now showing up as null. > after adding a simple test case and doing a binary search on commits we > landed on SPARK-37829 being the cause. > we observed the issue at first with NPE inside Aggregator.merge because None > was null. i am having a hard time replicating that in a spark unit test, but > i did manage to get a None become null in the output. simple test that now > fails: > > {code:java} > diff --git > a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala > b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala > index e9daa825dd4..a1959d7065d 100644 > --- > a/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala > +++ > b/sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala > @@ -228,6 +228,16 @@ case class FooAgg(s: Int) extends Aggregator[Row, Int, > Int] { > def outputEncoder: Encoder[Int] = Encoders.scalaInt > } > > +object OptionStringAgg extends Aggregator[Option[String], Option[String], > Option[String]] { > + override def zero: Option[String] = None > + override def reduce(b: Option[String], a: Option[String]): Option[String] > = merge(b, a) > + override def finish(reduction: Option[String]): Option[String] = reduction > + override def merge(b1: Option[String], b2: Option[String]): Option[String] > = > + b1.map{ b1v => b2.map{ b