[jira] [Updated] (SPARK-46962) Implement python worker to run python streaming data source
[ https://issues.apache.org/jira/browse/SPARK-46962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-46962: - Component/s: Structured Streaming (was: SS) > Implement python worker to run python streaming data source > --- > > Key: SPARK-46962 > URL: https://issues.apache.org/jira/browse/SPARK-46962 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Chaoqin Li >Priority: Major > Labels: pull-request-available > > Implement python worker to run python streaming data source and communicate > with JVM through socket. Create a PythonMicrobatchStream to invoke RPC > function call -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46866) Streaming python data source API
[ https://issues.apache.org/jira/browse/SPARK-46866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-46866: - Affects Version/s: 4.0.0 (was: 3.5.0) > Streaming python data source API > > > Key: SPARK-46866 > URL: https://issues.apache.org/jira/browse/SPARK-46866 > Project: Spark > Issue Type: Epic > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Chaoqin Li >Priority: Major > > This is a follow up of https://issues.apache.org/jira/browse/SPARK-44076. The > idea is to enable Python developers to develop streaming data sources in > python. The goal is to make a Python-based API that is simple and easy to > use, thus making Spark more accessible to the wider Python developer > community. > > Design doc: > https://docs.google.com/document/d/1cJ-w1hGPOBFp-5DLmf68sTLsAOwb55oW6SAuuAUFEM4/edit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46866) Streaming python data source API
[ https://issues.apache.org/jira/browse/SPARK-46866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-46866: - Component/s: Structured Streaming (was: SS) > Streaming python data source API > > > Key: SPARK-46866 > URL: https://issues.apache.org/jira/browse/SPARK-46866 > Project: Spark > Issue Type: Epic > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Chaoqin Li >Priority: Major > > This is a follow up of https://issues.apache.org/jira/browse/SPARK-44076. The > idea is to enable Python developers to develop streaming data sources in > python. The goal is to make a Python-based API that is simple and easy to > use, thus making Spark more accessible to the wider Python developer > community. > > Design doc: > https://docs.google.com/document/d/1cJ-w1hGPOBFp-5DLmf68sTLsAOwb55oW6SAuuAUFEM4/edit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47031) Union of with non-determinstic expression should be non-deterministic
Holden Karau created SPARK-47031: Summary: Union of with non-determinstic expression should be non-deterministic Key: SPARK-47031 URL: https://issues.apache.org/jira/browse/SPARK-47031 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Holden Karau We have special case handling for nullability already where any expression which is unioned with a nullable field becomes nullable, but we should do the same for deterministic. I found this while I was poking around with push downs. I believe the code to be updated would be output in the union case class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47030) Add `WebBrowserTest`
[ https://issues.apache.org/jira/browse/SPARK-47030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47030. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45089 [https://github.com/apache/spark/pull/45089] > Add `WebBrowserTest` > > > Key: SPARK-47030 > URL: https://issues.apache.org/jira/browse/SPARK-47030 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Structured Streaming, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47025) Switch `Guava 19.0` dependency scope from `provided` to `test`
[ https://issues.apache.org/jira/browse/SPARK-47025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-47025: Assignee: Dongjoon Hyun > Switch `Guava 19.0` dependency scope from `provided` to `test` > -- > > Key: SPARK-47025 > URL: https://issues.apache.org/jira/browse/SPARK-47025 > Project: Spark > Issue Type: Test > Components: Build, SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47025) Switch `Guava 19.0` dependency scope from `provided` to `test`
[ https://issues.apache.org/jira/browse/SPARK-47025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-47025. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45088 [https://github.com/apache/spark/pull/45088] > Switch `Guava 19.0` dependency scope from `provided` to `test` > -- > > Key: SPARK-47025 > URL: https://issues.apache.org/jira/browse/SPARK-47025 > Project: Spark > Issue Type: Test > Components: Build, SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47004) Increase Scala client test coverage
[ https://issues.apache.org/jira/browse/SPARK-47004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47004: Assignee: Bo Gao > Increase Scala client test coverage > --- > > Key: SPARK-47004 > URL: https://issues.apache.org/jira/browse/SPARK-47004 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Bo Gao >Assignee: Bo Gao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47004) Increase Scala client test coverage
[ https://issues.apache.org/jira/browse/SPARK-47004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47004. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45063 [https://github.com/apache/spark/pull/45063] > Increase Scala client test coverage > --- > > Key: SPARK-47004 > URL: https://issues.apache.org/jira/browse/SPARK-47004 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Bo Gao >Assignee: Bo Gao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47026) Include coverage of JSON data sources in array/struct/map default value tests
[ https://issues.apache.org/jira/browse/SPARK-47026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47026. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45086 [https://github.com/apache/spark/pull/45086] > Include coverage of JSON data sources in array/struct/map default value tests > - > > Key: SPARK-47026 > URL: https://issues.apache.org/jira/browse/SPARK-47026 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0, 4.0.0 >Reporter: Mark Jarvin >Assignee: Mark Jarvin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47026) Include coverage of JSON data sources in array/struct/map default value tests
[ https://issues.apache.org/jira/browse/SPARK-47026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47026: Assignee: Mark Jarvin > Include coverage of JSON data sources in array/struct/map default value tests > - > > Key: SPARK-47026 > URL: https://issues.apache.org/jira/browse/SPARK-47026 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0, 4.0.0 >Reporter: Mark Jarvin >Assignee: Mark Jarvin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46944) Follow up to SPARK-46792: Fix minor typing oversight
[ https://issues.apache.org/jira/browse/SPARK-46944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46944. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44983 [https://github.com/apache/spark/pull/44983] > Follow up to SPARK-46792: Fix minor typing oversight > > > Key: SPARK-46944 > URL: https://issues.apache.org/jira/browse/SPARK-46944 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Alice Sayutina >Assignee: Alice Sayutina >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47030) Add `WebBrowserTest`
[ https://issues.apache.org/jira/browse/SPARK-47030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47030: -- Component/s: Spark Core SQL Structured Streaming > Add `WebBrowserTest` > > > Key: SPARK-47030 > URL: https://issues.apache.org/jira/browse/SPARK-47030 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Structured Streaming, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47030) Add `WebBrowserTest`
[ https://issues.apache.org/jira/browse/SPARK-47030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47030: - Assignee: Dongjoon Hyun > Add `WebBrowserTest` > > > Key: SPARK-47030 > URL: https://issues.apache.org/jira/browse/SPARK-47030 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47030) Add `WebBrowserTest`
[ https://issues.apache.org/jira/browse/SPARK-47030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47030: --- Labels: pull-request-available (was: ) > Add `WebBrowserTest` > > > Key: SPARK-47030 > URL: https://issues.apache.org/jira/browse/SPARK-47030 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47030) Add `WebBrowserTest`
Dongjoon Hyun created SPARK-47030: - Summary: Add `WebBrowserTest` Key: SPARK-47030 URL: https://issues.apache.org/jira/browse/SPARK-47030 Project: Spark Issue Type: Sub-task Components: Tests Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession
[ https://issues.apache.org/jira/browse/SPARK-47014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47014: --- Labels: pull-request-available (was: ) > Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession > - > > Key: SPARK-47014 > URL: https://issues.apache.org/jira/browse/SPARK-47014 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47027) Use temporary directories for profiler test outputs
[ https://issues.apache.org/jira/browse/SPARK-47027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47027. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45087 [https://github.com/apache/spark/pull/45087] > Use temporary directories for profiler test outputs > --- > > Key: SPARK-47027 > URL: https://issues.apache.org/jira/browse/SPARK-47027 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47024) Sum of floats/doubles may be incorrect depending on partitioning
[ https://issues.apache.org/jira/browse/SPARK-47024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-47024. -- Resolution: Not A Problem Resolving this as "Not A Problem". I mean, it _is_ a problem, but it's a basic problem with floats, and I don't think there is anything practical that can be done about it in Spark. > Sum of floats/doubles may be incorrect depending on partitioning > > > Key: SPARK-47024 > URL: https://issues.apache.org/jira/browse/SPARK-47024 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2, 3.5.0, 3.3.4 >Reporter: Nicholas Chammas >Priority: Major > Labels: correctness > > I found this problem using > [Hypothesis|https://hypothesis.readthedocs.io/en/latest/]. > Here's a reproduction that fails on {{{}master{}}}, 3.5.0, 3.4.2, and 3.3.4 > (and probably all prior versions as well): > {code:python} > from pyspark.sql import SparkSession > from pyspark.sql.functions import col, sum > SUM_EXAMPLE = [ > (1.0,), > (0.0,), > (1.0,), > (9007199254740992.0,), > ] > spark = ( > SparkSession.builder > .config("spark.log.level", "ERROR") > .getOrCreate() > ) > def compare_sums(data, num_partitions): > df = spark.createDataFrame(data, "val double").coalesce(1) > result1 = df.agg(sum(col("val"))).collect()[0][0] > df = spark.createDataFrame(data, "val double").repartition(num_partitions) > result2 = df.agg(sum(col("val"))).collect()[0][0] > assert result1 == result2, f"{result1}, {result2}" > if __name__ == "__main__": > print(compare_sums(SUM_EXAMPLE, 2)) > {code} > This fails as follows: > {code:python} > AssertionError: 9007199254740994.0, 9007199254740992.0 > {code} > I suspected some kind of problem related to code generation, so tried setting > all of these to {{{}false{}}}: > * {{spark.sql.codegen.wholeStage}} > * {{spark.sql.codegen.aggregate.map.twolevel.enabled}} > * {{spark.sql.codegen.aggregate.splitAggregateFunc.enabled}} > But this did not change the behavior. > Somehow, the partitioning of the data affects the computed sum. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47024) Sum of floats/doubles may be incorrect depending on partitioning
[ https://issues.apache.org/jira/browse/SPARK-47024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-47024: - Description: I found this problem using [Hypothesis|https://hypothesis.readthedocs.io/en/latest/]. Here's a reproduction that fails on {{{}master{}}}, 3.5.0, 3.4.2, and 3.3.4 (and probably all prior versions as well): {code:python} from pyspark.sql import SparkSession from pyspark.sql.functions import col, sum SUM_EXAMPLE = [ (1.0,), (0.0,), (1.0,), (9007199254740992.0,), ] spark = ( SparkSession.builder .config("spark.log.level", "ERROR") .getOrCreate() ) def compare_sums(data, num_partitions): df = spark.createDataFrame(data, "val double").coalesce(1) result1 = df.agg(sum(col("val"))).collect()[0][0] df = spark.createDataFrame(data, "val double").repartition(num_partitions) result2 = df.agg(sum(col("val"))).collect()[0][0] assert result1 == result2, f"{result1}, {result2}" if __name__ == "__main__": print(compare_sums(SUM_EXAMPLE, 2)) {code} This fails as follows: {code:python} AssertionError: 9007199254740994.0, 9007199254740992.0 {code} I suspected some kind of problem related to code generation, so tried setting all of these to {{{}false{}}}: * {{spark.sql.codegen.wholeStage}} * {{spark.sql.codegen.aggregate.map.twolevel.enabled}} * {{spark.sql.codegen.aggregate.splitAggregateFunc.enabled}} But this did not change the behavior. Somehow, the partitioning of the data affects the computed sum. was:Will fill in the details shortly. Summary: Sum of floats/doubles may be incorrect depending on partitioning (was: Sum is incorrect (exact cause currently unknown)) Sadly, I think this is a case where we may not be able to do anything. The problem appears to be a classic case of floating point arithmetic going wrong. {code:scala} scala> 9007199254740992.0 + 1.0 val res0: Double = 9.007199254740992E15 scala> 9007199254740992.0 + 2.0 val res1: Double = 9.007199254740994E15 {code} Notice how adding {{1.0}} did not change the large value, whereas adding {{2.0}} did. So what I believe is happening is that, depending on the order in which the rows happen to be added, we either hit or do not hit this corner case. In other words, if the aggregation goes like this: {code:java} (1.0 + 1.0) + (0.0 + 9007199254740992.0) 2.0 + 9007199254740992.0 9007199254740994.0 {code} Then there is no problem. However, if we are unlucky and it goes like this: {code:java} (1.0 + 0.0) + (1.0 + 9007199254740992.0) 1.0 + 9007199254740992.0 9007199254740992.0 {code} Then we get the incorrect result shown in the description above. This violates what I believe should be an invariant in Spark: That declarative aggregates like {{sum}} do not compute different results depending on accidents of row order or partitioning. However, given that this is a basic problem of floating point arithmetic, I doubt we can really do anything here. Note that there are many such "special" numbers that have this problem, not just 9007199254740992.0: {code:scala} scala> 1.7168917017330176e+16 + 1.0 val res2: Double = 1.7168917017330176E16 scala> 1.7168917017330176e+16 + 2.0 val res3: Double = 1.7168917017330178E16 {code} > Sum of floats/doubles may be incorrect depending on partitioning > > > Key: SPARK-47024 > URL: https://issues.apache.org/jira/browse/SPARK-47024 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2, 3.5.0, 3.3.4 >Reporter: Nicholas Chammas >Priority: Major > Labels: correctness > > I found this problem using > [Hypothesis|https://hypothesis.readthedocs.io/en/latest/]. > Here's a reproduction that fails on {{{}master{}}}, 3.5.0, 3.4.2, and 3.3.4 > (and probably all prior versions as well): > {code:python} > from pyspark.sql import SparkSession > from pyspark.sql.functions import col, sum > SUM_EXAMPLE = [ > (1.0,), > (0.0,), > (1.0,), > (9007199254740992.0,), > ] > spark = ( > SparkSession.builder > .config("spark.log.level", "ERROR") > .getOrCreate() > ) > def compare_sums(data, num_partitions): > df = spark.createDataFrame(data, "val double").coalesce(1) > result1 = df.agg(sum(col("val"))).collect()[0][0] > df = spark.createDataFrame(data, "val double").repartition(num_partitions) > result2 = df.agg(sum(col("val"))).collect()[0][0] > assert result1 == result2, f"{result1}, {result2}" > if __name__ == "__main__": > print(compare_sums(SUM_EXAMPLE, 2)) > {code} > This fails as follows: > {code:python} > AssertionError: 9007199254740994.0, 9007199254740992.0 > {code} > I suspected some kind of problem related to code generation, so tried setting > all of these to {{{}false{}}}: > * {{spark.sql.codegen.who
[jira] [Updated] (SPARK-47027) Use temporary directories for profiler test outputs
[ https://issues.apache.org/jira/browse/SPARK-47027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47027: --- Labels: pull-request-available (was: ) > Use temporary directories for profiler test outputs > --- > > Key: SPARK-47027 > URL: https://issues.apache.org/jira/browse/SPARK-47027 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47026) Include coverage of JSON data sources in array/struct/map default value tests
[ https://issues.apache.org/jira/browse/SPARK-47026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47026: --- Labels: pull-request-available (was: ) > Include coverage of JSON data sources in array/struct/map default value tests > - > > Key: SPARK-47026 > URL: https://issues.apache.org/jira/browse/SPARK-47026 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0, 4.0.0 >Reporter: Mark Jarvin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47027) Use temporary directories for profiler test outputs
[ https://issues.apache.org/jira/browse/SPARK-47027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-47027: -- Summary: Use temporary directories for profiler test outputs (was: Move TestUtils to the generic testing utils.) > Use temporary directories for profiler test outputs > --- > > Key: SPARK-47027 > URL: https://issues.apache.org/jira/browse/SPARK-47027 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47029) ALTER COLUMN DROP DEFAULT test fails with JSON data sources
Mark Jarvin created SPARK-47029: --- Summary: ALTER COLUMN DROP DEFAULT test fails with JSON data sources Key: SPARK-47029 URL: https://issues.apache.org/jira/browse/SPARK-47029 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0, 4.0.0 Reporter: Mark Jarvin Enabling the JSON data source causes a test case to fail: {code:java} [info] - SPARK-39557 INSERT INTO statements with tables with map defaults *** FAILED *** (1 second, 498 milliseconds) [info] Results do not match for query: [info] Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-2880,dstSavings=360,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-2880,dstSavings=360,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=720,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=720,endTimeMode=0]] [info] Timezone Env: [info] [info] == Parsed Logical Plan == [info] 'UnresolvedRelation [t], [], false [info] [info] == Analyzed Logical Plan == [info] i: int, s: struct>,y:array>>, t: array> [info] SubqueryAlias spark_catalog.default.t [info] +- Relation spark_catalog.default.t[i#13929,s#13930,t#13931] json [info] [info] == Optimized Logical Plan == [info] Relation spark_catalog.default.t[i#13929,s#13930,t#13931] json [info] [info] == Physical Plan == [info] FileScan json spark_catalog.default.t[i#13929,s#13930,t#13931] Batched: false, DataFilters: [], Format: JSON, Location: InMemoryFileIndex(1 paths)[file:/home/mark.jarvin/photon/spark/sql/core/spark-warehouse/org.apach..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct>,y:array>>,t:array struct>,y:array>>,t:array>> [info] ![1,[List([1,2]),List(Map(def -> false, jkl -> true))],List(Map(xyz -> true))] [1,[ArraySeq([1,2]),ArraySeq(Map(def -> false, jkl -> true))],ArraySeq(Map(xyz -> true))] [info] ![2,null,List(Map(xyz -> true))] [2,[ArraySeq([1,2]),ArraySeq(Map(def -> false, jkl -> true))],ArraySeq(Map(xyz -> true))] [info] ![3,[List([3,4]),List(Map(mno -> false, pqr -> true))],List(Map(xyz -> true))] [3,[ArraySeq([3,4]),ArraySeq(Map(mno -> false, pqr -> true))],ArraySeq(Map(xyz -> true))] [info] ![4,[List([3,4]),List(Map(mno -> false, pqr -> true))],List(Map(xyz -> true))] [4,[ArraySeq([3,4]),ArraySeq(Map(mno -> false, pqr -> true))],ArraySeq(Map(xyz -> true))] (QueryTest.scala:267){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47028) Check SparkUnsupportedOperationException instead of UnsupportedOperationException
Max Gekk created SPARK-47028: Summary: Check SparkUnsupportedOperationException instead of UnsupportedOperationException Key: SPARK-47028 URL: https://issues.apache.org/jira/browse/SPARK-47028 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk Use checkError() to test the SparkUnsupportedOperationException exception instead of UnsupportedOperationException in the SQL project. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47027) Move TestUtils to the generic testing utils.
Takuya Ueshin created SPARK-47027: - Summary: Move TestUtils to the generic testing utils. Key: SPARK-47027 URL: https://issues.apache.org/jira/browse/SPARK-47027 Project: Spark Issue Type: Test Components: Tests Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47026) Include coverage of JSON data sources in array/struct/map default value tests
Mark Jarvin created SPARK-47026: --- Summary: Include coverage of JSON data sources in array/struct/map default value tests Key: SPARK-47026 URL: https://issues.apache.org/jira/browse/SPARK-47026 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0, 4.0.0 Reporter: Mark Jarvin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47023) Upgrade `aircompressor` to 1.26
[ https://issues.apache.org/jira/browse/SPARK-47023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47023. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45084 [https://github.com/apache/spark/pull/45084] > Upgrade `aircompressor` to 1.26 > --- > > Key: SPARK-47023 > URL: https://issues.apache.org/jira/browse/SPARK-47023 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47025) Switch `Guava 19.0` dependency scope from `provided` to `test`
[ https://issues.apache.org/jira/browse/SPARK-47025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47025: --- Labels: pull-request-available (was: ) > Switch `Guava 19.0` dependency scope from `provided` to `test` > -- > > Key: SPARK-47025 > URL: https://issues.apache.org/jira/browse/SPARK-47025 > Project: Spark > Issue Type: Test > Components: Build, SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47025) Switch `Guava 19.0` dependency scope from `provided` to `test`
Dongjoon Hyun created SPARK-47025: - Summary: Switch `Guava 19.0` dependency scope from `provided` to `test` Key: SPARK-47025 URL: https://issues.apache.org/jira/browse/SPARK-47025 Project: Spark Issue Type: Test Components: SQL, Tests Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47025) Switch `Guava 19.0` dependency scope from `provided` to `test`
[ https://issues.apache.org/jira/browse/SPARK-47025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47025: -- Component/s: Build > Switch `Guava 19.0` dependency scope from `provided` to `test` > -- > > Key: SPARK-47025 > URL: https://issues.apache.org/jira/browse/SPARK-47025 > Project: Spark > Issue Type: Test > Components: Build, SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47024) Sum is incorrect (exact cause currently unknown)
Nicholas Chammas created SPARK-47024: Summary: Sum is incorrect (exact cause currently unknown) Key: SPARK-47024 URL: https://issues.apache.org/jira/browse/SPARK-47024 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.4, 3.5.0, 3.4.2 Reporter: Nicholas Chammas Will fill in the details shortly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44445) Upgrade to `htmlunit` 3.10.0 and `htmlunit3-driver` 4.17.0
[ https://issues.apache.org/jira/browse/SPARK-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-5. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45079 [https://github.com/apache/spark/pull/45079] > Upgrade to `htmlunit` 3.10.0 and `htmlunit3-driver` 4.17.0 > -- > > Key: SPARK-5 > URL: https://issues.apache.org/jira/browse/SPARK-5 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > [CVE-2023-26119|https://nvd.nist.gov/vuln/detail/CVE-2023-26119] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47023) Upgrade `aircompressor` to 1.26
Dongjoon Hyun created SPARK-47023: - Summary: Upgrade `aircompressor` to 1.26 Key: SPARK-47023 URL: https://issues.apache.org/jira/browse/SPARK-47023 Project: Spark Issue Type: Bug Components: Build Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org