[jira] [Updated] (SPARK-41873) Implement DataFrame `pandas_api`
[ https://issues.apache.org/jira/browse/SPARK-41873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-41873: --- Labels: pull-request-available (was: ) > Implement DataFrame `pandas_api` > > > Key: SPARK-41873 > URL: https://issues.apache.org/jira/browse/SPARK-41873 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47182) Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` and `avro-*`
[ https://issues.apache.org/jira/browse/SPARK-47182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47182. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45278 [https://github.com/apache/spark/pull/45278] > Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` > and `avro-*` > - > > Key: SPARK-47182 > URL: https://issues.apache.org/jira/browse/SPARK-47182 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47176) Have a ResolveAllExpressionsUpWithPruning helper function
[ https://issues.apache.org/jira/browse/SPARK-47176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47176. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45270 [https://github.com/apache/spark/pull/45270] > Have a ResolveAllExpressionsUpWithPruning helper function > - > > Key: SPARK-47176 > URL: https://issues.apache.org/jira/browse/SPARK-47176 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47182) Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` and `avro-*`
[ https://issues.apache.org/jira/browse/SPARK-47182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47182: - Assignee: Dongjoon Hyun > Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` > and `avro-*` > - > > Key: SPARK-47182 > URL: https://issues.apache.org/jira/browse/SPARK-47182 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47182) Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` and `avro-*`
[ https://issues.apache.org/jira/browse/SPARK-47182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47182: --- Labels: pull-request-available (was: ) > Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` > and `avro-*` > - > > Key: SPARK-47182 > URL: https://issues.apache.org/jira/browse/SPARK-47182 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47182) Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` and `avro-*`
Dongjoon Hyun created SPARK-47182: - Summary: Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` and `avro-*` Key: SPARK-47182 URL: https://issues.apache.org/jira/browse/SPARK-47182 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41811) Implement SparkSession.sql's string formatter
[ https://issues.apache.org/jira/browse/SPARK-41811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-41811: --- Labels: pull-request-available (was: ) > Implement SparkSession.sql's string formatter > - > > Key: SPARK-41811 > URL: https://issues.apache.org/jira/browse/SPARK-41811 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > ** > File "/.../spark/python/pyspark/sql/connect/session.py", line 345, in > pyspark.sql.connect.session.SparkSession.sql > Failed example: > spark.sql( > "SELECT * FROM range(10) WHERE id > {bound1} AND id < {bound2}", > bound1=7, bound2=9 > ).show() > Exception raised: > Traceback (most recent call last): > File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line > 1336, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > spark.sql( > TypeError: sql() got an unexpected keyword argument 'bound1' > ** > File "/.../spark/python/pyspark/sql/connect/session.py", line 355, in > pyspark.sql.connect.session.SparkSession.sql > Failed example: > spark.sql( > "SELECT {col} FROM {mydf} WHERE id IN {x}", > col=mydf.id, mydf=mydf, x=tuple(range(4))).show() > Exception raised: > Traceback (most recent call last): > File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line > 1336, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > spark.sql( > TypeError: sql() got an unexpected keyword argument 'col' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47181) Fix `MasterSuite` to validate the number of registered workers
[ https://issues.apache.org/jira/browse/SPARK-47181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47181: - Assignee: Dongjoon Hyun > Fix `MasterSuite` to validate the number of registered workers > -- > > Key: SPARK-47181 > URL: https://issues.apache.org/jira/browse/SPARK-47181 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47181) Fix `MasterSuite` to validate the number of registered workers
[ https://issues.apache.org/jira/browse/SPARK-47181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47181. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45274 [https://github.com/apache/spark/pull/45274] > Fix `MasterSuite` to validate the number of registered workers > -- > > Key: SPARK-47181 > URL: https://issues.apache.org/jira/browse/SPARK-47181 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47178) Add a test case for createDataFrame with dataclasses
[ https://issues.apache.org/jira/browse/SPARK-47178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47178: Assignee: Hyukjin Kwon > Add a test case for createDataFrame with dataclasses > > > Key: SPARK-47178 > URL: https://issues.apache.org/jira/browse/SPARK-47178 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47178) Add a test case for createDataFrame with dataclasses
[ https://issues.apache.org/jira/browse/SPARK-47178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47178. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45271 [https://github.com/apache/spark/pull/45271] > Add a test case for createDataFrame with dataclasses > > > Key: SPARK-47178 > URL: https://issues.apache.org/jira/browse/SPARK-47178 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47181) Fix `MasterSuite` to validate the number of registered workers
[ https://issues.apache.org/jira/browse/SPARK-47181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47181: --- Labels: pull-request-available (was: ) > Fix `MasterSuite` to validate the number of registered workers > -- > > Key: SPARK-47181 > URL: https://issues.apache.org/jira/browse/SPARK-47181 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47181) Fix `MasterSuite` to validate the number of registered workers
Dongjoon Hyun created SPARK-47181: - Summary: Fix `MasterSuite` to validate the number of registered workers Key: SPARK-47181 URL: https://issues.apache.org/jira/browse/SPARK-47181 Project: Spark Issue Type: Sub-task Components: Spark Core, Tests Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47180) Migrate CSV parsing off of Univocity
Nicholas Chammas created SPARK-47180: Summary: Migrate CSV parsing off of Univocity Key: SPARK-47180 URL: https://issues.apache.org/jira/browse/SPARK-47180 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Nicholas Chammas Univocity appears to be unmaintained. As of February 2024: * The last release was [more than 3 years ago|https://github.com/uniVocity/univocity-parsers/releases]. * The last commit to {{master}} was [almost 3 years ago|https://github.com/uniVocity/univocity-parsers/commits/master/]. * The website is [down|https://github.com/uniVocity/univocity-parsers/issues/506]. * There are [multiple|https://github.com/uniVocity/univocity-parsers/issues/494] [open|https://github.com/uniVocity/univocity-parsers/issues/495] [bugs|https://github.com/uniVocity/univocity-parsers/issues/499] on the tracker with no indication that anyone cares. It's not urgent, but we should consider migrating to an actively maintained CSV library in the JVM ecosystem. There are a bunch of libraries [listed here on this Maven Repository|https://mvnrepository.com/open-source/csv-libraries]. [jackson-dataformats-text|https://github.com/FasterXML/jackson-dataformats-text] looks interesting. I know we already use FasterXML to parse JSON. Perhaps we should use them to parse CSV as well. I'm guessing we chose Univocity back in the day because it was the fastest CSV library on the JVM. However, the last performance benchmark comparing it to others was [from February 2018|https://github.com/uniVocity/csv-parsers-comparison/blob/5548b52f2cc27eb19c11464e9a331491e8ad4ba6/README.md#statistics-updated-28th-of-february-2018], so this may no longer be true. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47166) Improve merge_spark_pr.py by emphasising input and error
[ https://issues.apache.org/jira/browse/SPARK-47166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47166. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45256 [https://github.com/apache/spark/pull/45256] > Improve merge_spark_pr.py by emphasising input and error > > > Key: SPARK-47166 > URL: https://issues.apache.org/jira/browse/SPARK-47166 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47179) Improve error message from SparkThrowableSuite for better debuggability
[ https://issues.apache.org/jira/browse/SPARK-47179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47179: --- Labels: pull-request-available (was: ) > Improve error message from SparkThrowableSuite for better debuggability > --- > > Key: SPARK-47179 > URL: https://issues.apache.org/jira/browse/SPARK-47179 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Current error message is not very helpful when error classes documentation is > not up-to-date so we better improve it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47179) Improve error message from SparkThrowableSuite for better debuggability
Haejoon Lee created SPARK-47179: --- Summary: Improve error message from SparkThrowableSuite for better debuggability Key: SPARK-47179 URL: https://issues.apache.org/jira/browse/SPARK-47179 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Haejoon Lee Current error message is not very helpful when error classes documentation is not up-to-date so we better improve it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47178) Add a test case for createDataFrame with dataclasses
[ https://issues.apache.org/jira/browse/SPARK-47178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47178: --- Labels: pull-request-available (was: ) > Add a test case for createDataFrame with dataclasses > > > Key: SPARK-47178 > URL: https://issues.apache.org/jira/browse/SPARK-47178 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47178) Add a test case for createDataFrame with dataclasses
Hyukjin Kwon created SPARK-47178: Summary: Add a test case for createDataFrame with dataclasses Key: SPARK-47178 URL: https://issues.apache.org/jira/browse/SPARK-47178 Project: Spark Issue Type: Test Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47175) Remove ZOOKEEPER-1844 comment from KafkaTestUtils
[ https://issues.apache.org/jira/browse/SPARK-47175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47175. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45265 [https://github.com/apache/spark/pull/45265] > Remove ZOOKEEPER-1844 comment from KafkaTestUtils > - > > Key: SPARK-47175 > URL: https://issues.apache.org/jira/browse/SPARK-47175 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47175) Remove ZOOKEEPER-1844 comment from KafkaTestUtils
[ https://issues.apache.org/jira/browse/SPARK-47175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-47175: Assignee: Dongjoon Hyun > Remove ZOOKEEPER-1844 comment from KafkaTestUtils > - > > Key: SPARK-47175 > URL: https://issues.apache.org/jira/browse/SPARK-47175 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47165) Pull docker image only when its' absent
[ https://issues.apache.org/jira/browse/SPARK-47165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-47165: Assignee: Kent Yao > Pull docker image only when its' absent > --- > > Key: SPARK-47165 > URL: https://issues.apache.org/jira/browse/SPARK-47165 > Project: Spark > Issue Type: Test > Components: Spark Docker >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47165) Pull docker image only when its' absent
[ https://issues.apache.org/jira/browse/SPARK-47165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47165. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45255 [https://github.com/apache/spark/pull/45255] > Pull docker image only when its' absent > --- > > Key: SPARK-47165 > URL: https://issues.apache.org/jira/browse/SPARK-47165 > Project: Spark > Issue Type: Test > Components: Spark Docker >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47164) Make Default Value From Wider Type Narrow Literal of v2 behave the same as v1
[ https://issues.apache.org/jira/browse/SPARK-47164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47164. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45254 [https://github.com/apache/spark/pull/45254] > Make Default Value From Wider Type Narrow Literal of v2 behave the same as v1 > - > > Key: SPARK-47164 > URL: https://issues.apache.org/jira/browse/SPARK-47164 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47176) Have a ResolveAllExpressionsUpWithPruning helper function
[ https://issues.apache.org/jira/browse/SPARK-47176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47176: --- Labels: pull-request-available (was: ) > Have a ResolveAllExpressionsUpWithPruning helper function > - > > Key: SPARK-47176 > URL: https://issues.apache.org/jira/browse/SPARK-47176 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47177) Cached SQL plan do not display final AQE plan in explain string
[ https://issues.apache.org/jira/browse/SPARK-47177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziqi Liu updated SPARK-47177: - Description: AQE plan is expected to display final plan after execution. This is not true for cached SQL plan: it will show the initial plan instead. This behavior change is introduced in [https://github.com/apache/spark/pull/40812] it tried to fix the concurrency issue with cached plan. I don't have a clear idea how yet, maybe we can check whether the AQE plan is finalized(make the final flag atomic first, of course), if not we can return the cloned one, otherwise it's thread-safe to return the final one, since it's immutable. A simple repro: {code:java} d1 = spark.range(1000).withColumn("key", expr("id % 100")).groupBy("key").agg({"key": "count"}) cached_d2 = d1.cache() df = cached_d2.withColumn("key2", expr("key % 10")).groupBy("key2").agg({"key2": "count"}) df.collect() {code} {code:java} >>> df.explain() == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == *(2) HashAggregate(keys=[key2#36L], functions=[count(key2#36L)]) +- AQEShuffleRead coalesced +- ShuffleQueryStage 1 +- Exchange hashpartitioning(key2#36L, 200), ENSURE_REQUIREMENTS, [plan_id=83] +- *(1) HashAggregate(keys=[key2#36L], functions=[partial_count(key2#36L)]) +- *(1) Project [(key#27L % 10) AS key2#36L] +- TableCacheQueryStage 0 +- InMemoryTableScan [key#27L] +- InMemoryRelation [key#27L, count(key)#33L], StorageLevel(disk, memory, deserialized, 1 replicas) +- AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[key#4L], functions=[count(key#4L)]) +- Exchange hashpartitioning(key#4L, 200), ENSURE_REQUIREMENTS, [plan_id=33] +- HashAggregate(keys=[key#4L], functions=[partial_count(key#4L)]) +- Project [(id#2L % 100) AS key#4L] +- Range (0, 1000, step=1, splits=10) +- == Initial Plan == HashAggregate(keys=[key2#36L], functions=[count(key2#36L)]) +- Exchange hashpartitioning(key2#36L, 200), ENSURE_REQUIREMENTS, [plan_id=30] +- HashAggregate(keys=[key2#36L], functions=[partial_count(key2#36L)]) +- Project [(key#27L % 10) AS key2#36L] +- InMemoryTableScan [key#27L] +- InMemoryRelation [key#27L, count(key)#33L], StorageLevel(disk, memory, deserialized, 1 replicas) +- AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[key#4L], functions=[count(key#4L)]) +- Exchange hashpartitioning(key#4L, 200), ENSURE_REQUIREMENTS, [plan_id=33] +- HashAggregate(keys=[key#4L], functions=[partial_count(key#4L)]) +- Project [(id#2L % 100) AS key#4L] +- Range (0, 1000, step=1, splits=10) {code} was: AQE plan is expected to display final plan after execution. This is not true for cached SQL plan: it will show the initial plan instead. This behavior change is introduced in [https://github.com/apache/spark/pull/40812] it tried to fix the concurrency issue with cached plan. I don't have a clear idea how yet, maybe we can check whether the AQE plan is finalized(make the final flag atomic first, of course), if not we can return the cloned one, otherwise it's thread-safe to return the final one, since it's immutable. A simple repro: {code:java} d1 = spark.range(1000).withColumn("key", expr("id % 100")).groupBy("key").agg({"key": "count"}) cached_d2 = d1.cache() df = cached_d2.withColumn("key2", expr("key % 10")).groupBy("key2").agg({"key2": "count"}) df.collect() {code} {code:java} Row(key2=7, count(key2)=10), Row(key2=3, count(key2)=10), Row(key2=1, count(key2)=10), Row(key2=8, count(key2)=10)] >>> df.explain() == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == *(2) HashAggregate(keys=[key2#36L], functions=[count(key2#36L)]) +- AQEShuffleRead coalesced +- ShuffleQueryStage 1 +- Exchange hashpartitioning(key2#36L, 200), ENSURE_REQUIREMENTS, [plan_id=83] +- *(1) HashAggregate(keys=[key2#36L], functions=[partial_count(key2#36L)]) +- *(1) Project [(key#27L % 10) AS key2#36L] +- TableCacheQueryStage 0 +- InMemoryTableScan [key#27L] +- InMemoryRelation [key#27L, count(key)#33L], StorageLevel(disk, memory, deserialized, 1 replicas) +- AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[key#4L],
[jira] [Created] (SPARK-47177) Cached SQL plan do not display final AQE plan in explain string
Ziqi Liu created SPARK-47177: Summary: Cached SQL plan do not display final AQE plan in explain string Key: SPARK-47177 URL: https://issues.apache.org/jira/browse/SPARK-47177 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.5.1, 3.5.0, 3.4.2, 4.0.0, 3.5.2 Reporter: Ziqi Liu AQE plan is expected to display final plan after execution. This is not true for cached SQL plan: it will show the initial plan instead. This behavior change is introduced in [https://github.com/apache/spark/pull/40812] it tried to fix the concurrency issue with cached plan. I don't have a clear idea how yet, maybe we can check whether the AQE plan is finalized(make the final flag atomic first, of course), if not we can return the cloned one, otherwise it's thread-safe to return the final one, since it's immutable. A simple repro: {code:java} d1 = spark.range(1000).withColumn("key", expr("id % 100")).groupBy("key").agg({"key": "count"}) cached_d2 = d1.cache() df = cached_d2.withColumn("key2", expr("key % 10")).groupBy("key2").agg({"key2": "count"}) df.collect() {code} {code:java} Row(key2=7, count(key2)=10), Row(key2=3, count(key2)=10), Row(key2=1, count(key2)=10), Row(key2=8, count(key2)=10)] >>> df.explain() == Physical Plan == AdaptiveSparkPlan isFinalPlan=true +- == Final Plan == *(2) HashAggregate(keys=[key2#36L], functions=[count(key2#36L)]) +- AQEShuffleRead coalesced +- ShuffleQueryStage 1 +- Exchange hashpartitioning(key2#36L, 200), ENSURE_REQUIREMENTS, [plan_id=83] +- *(1) HashAggregate(keys=[key2#36L], functions=[partial_count(key2#36L)]) +- *(1) Project [(key#27L % 10) AS key2#36L] +- TableCacheQueryStage 0 +- InMemoryTableScan [key#27L] +- InMemoryRelation [key#27L, count(key)#33L], StorageLevel(disk, memory, deserialized, 1 replicas) +- AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[key#4L], functions=[count(key#4L)]) +- Exchange hashpartitioning(key#4L, 200), ENSURE_REQUIREMENTS, [plan_id=33] +- HashAggregate(keys=[key#4L], functions=[partial_count(key#4L)]) +- Project [(id#2L % 100) AS key#4L] +- Range (0, 1000, step=1, splits=10) +- == Initial Plan == HashAggregate(keys=[key2#36L], functions=[count(key2#36L)]) +- Exchange hashpartitioning(key2#36L, 200), ENSURE_REQUIREMENTS, [plan_id=30] +- HashAggregate(keys=[key2#36L], functions=[partial_count(key2#36L)]) +- Project [(key#27L % 10) AS key2#36L] +- InMemoryTableScan [key#27L] +- InMemoryRelation [key#27L, count(key)#33L], StorageLevel(disk, memory, deserialized, 1 replicas) +- AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[key#4L], functions=[count(key#4L)]) +- Exchange hashpartitioning(key#4L, 200), ENSURE_REQUIREMENTS, [plan_id=33] +- HashAggregate(keys=[key#4L], functions=[partial_count(key#4L)]) +- Project [(id#2L % 100) AS key#4L] +- Range (0, 1000, step=1, splits=10) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47176) Have a ResolveAllExpressionsUpWithPruning helper function
Rui Wang created SPARK-47176: Summary: Have a ResolveAllExpressionsUpWithPruning helper function Key: SPARK-47176 URL: https://issues.apache.org/jira/browse/SPARK-47176 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47094) SPJ : Dynamically rebalance number of buckets when they are not equal
[ https://issues.apache.org/jira/browse/SPARK-47094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated SPARK-47094: -- Parent: SPARK-37375 Issue Type: Sub-task (was: New Feature) > SPJ : Dynamically rebalance number of buckets when they are not equal > - > > Key: SPARK-47094 > URL: https://issues.apache.org/jira/browse/SPARK-47094 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0, 3.4.0 >Reporter: Himadri Pal >Priority: Major > Labels: pull-request-available > > SPJ: Storage Partition Join works with Iceberg tables when both the tables > have same number of buckets. As part of this feature request, we would like > spark to gather the number of buckets information from both the tables and > dynamically rebalance the number of buckets by coalesce or repartition so > that SPJ will work fine. In this case, we would still have to shuffle but > would be better than no SPJ. > Use Case : > Many times we do not have control of the input tables, hence it's not > possible to change partitioning scheme on those tables. As a consumer, we > would still like them to be used with SPJ when used with other tables and > output tables which has different number of buckets. > In these scenario, we would need to read those tables rewrite them with > matching number of buckets for the SPJ to work, this extra step could > outweigh the benefits of less shuffle via SPJ. Also when there are multiple > different tables being joined, each tables need to be rewritten with matching > number of buckets. > If this feature is implemented, SPJ functionality will be more powerful. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47094) SPJ : Dynamically rebalance number of buckets when they are not equal
[ https://issues.apache.org/jira/browse/SPARK-47094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47094: --- Labels: pull-request-available (was: ) > SPJ : Dynamically rebalance number of buckets when they are not equal > - > > Key: SPARK-47094 > URL: https://issues.apache.org/jira/browse/SPARK-47094 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.3.0, 3.4.0 >Reporter: Himadri Pal >Priority: Major > Labels: pull-request-available > > SPJ: Storage Partition Join works with Iceberg tables when both the tables > have same number of buckets. As part of this feature request, we would like > spark to gather the number of buckets information from both the tables and > dynamically rebalance the number of buckets by coalesce or repartition so > that SPJ will work fine. In this case, we would still have to shuffle but > would be better than no SPJ. > Use Case : > Many times we do not have control of the input tables, hence it's not > possible to change partitioning scheme on those tables. As a consumer, we > would still like them to be used with SPJ when used with other tables and > output tables which has different number of buckets. > In these scenario, we would need to read those tables rewrite them with > matching number of buckets for the SPJ to work, this extra step could > outweigh the benefits of less shuffle via SPJ. Also when there are multiple > different tables being joined, each tables need to be rewritten with matching > number of buckets. > If this feature is implemented, SPJ functionality will be more powerful. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24815) Structured Streaming should support dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820915#comment-17820915 ] Mich Talebzadeh edited comment on SPARK-24815 at 2/26/24 11:58 PM: --- Now that the ticket is reopened let us review the submitted documents. This has got 6 votes as of today. I volunteered to mentor it until a committer comes forward. Hope this helps to speed up the process and time to delivery. was (Author: mich.talebza...@gmail.com): Now that the ticket is reopened let us review the submitted documents. This has got 6 votes for now. I volunteered to mentor it until a committer comes forward to it. Hope this helps to speed up the process and time to delivery. > Structured Streaming should support dynamic allocation > -- > > Key: SPARK-24815 > URL: https://issues.apache.org/jira/browse/SPARK-24815 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core, Structured Streaming >Affects Versions: 2.3.1 >Reporter: Karthik Palaniappan >Priority: Minor > Labels: pull-request-available > > For batch jobs, dynamic allocation is very useful for adding and removing > containers to match the actual workload. On multi-tenant clusters, it ensures > that a Spark job is taking no more resources than necessary. In cloud > environments, it enables autoscaling. > However, if you set spark.dynamicAllocation.enabled=true and run a structured > streaming job, the batch dynamic allocation algorithm kicks in. It requests > more executors if the task backlog is a certain size, and removes executors > if they idle for a certain period of time. > Quick thoughts: > 1) Dynamic allocation should be pluggable, rather than hardcoded to a > particular implementation in SparkContext.scala (this should be a separate > JIRA). > 2) We should make a structured streaming algorithm that's separate from the > batch algorithm. Eventually, continuous processing might need its own > algorithm. > 3) Spark should print a warning if you run a structured streaming job when > Core's dynamic allocation is enabled -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24815) Structured Streaming should support dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820915#comment-17820915 ] Mich Talebzadeh commented on SPARK-24815: - Now that the ticket is reopened let us review the submitted documents. This has got 6 votes for now. I volunteered to mentor it until a committer comes forward to it. Hope this helps to speed up the process and time to delivery. > Structured Streaming should support dynamic allocation > -- > > Key: SPARK-24815 > URL: https://issues.apache.org/jira/browse/SPARK-24815 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core, Structured Streaming >Affects Versions: 2.3.1 >Reporter: Karthik Palaniappan >Priority: Minor > Labels: pull-request-available > > For batch jobs, dynamic allocation is very useful for adding and removing > containers to match the actual workload. On multi-tenant clusters, it ensures > that a Spark job is taking no more resources than necessary. In cloud > environments, it enables autoscaling. > However, if you set spark.dynamicAllocation.enabled=true and run a structured > streaming job, the batch dynamic allocation algorithm kicks in. It requests > more executors if the task backlog is a certain size, and removes executors > if they idle for a certain period of time. > Quick thoughts: > 1) Dynamic allocation should be pluggable, rather than hardcoded to a > particular implementation in SparkContext.scala (this should be a separate > JIRA). > 2) We should make a structured streaming algorithm that's separate from the > batch algorithm. Eventually, continuous processing might need its own > algorithm. > 3) Spark should print a warning if you run a structured streaming job when > Core's dynamic allocation is enabled -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44400) Improve Scala StreamingQueryListener to provide users a way to access the Spark session for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-44400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44400. -- Resolution: Duplicate > Improve Scala StreamingQueryListener to provide users a way to access the > Spark session for Spark Connect > - > > Key: SPARK-44400 > URL: https://issues.apache.org/jira/browse/SPARK-44400 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Bo Gao >Priority: Major > > Improve the Listener to provide users a way to access the Spark session and > perform arbitrary actions inside the Listener. Right now users can use `val > spark = SparkSession.builder.getOrCreate()` to create a Spark session inside > the Listener, but this is a legacy session instead of a connect remote > session. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44462) Fix the session passed to foreachBatch.
[ https://issues.apache.org/jira/browse/SPARK-44462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44462. -- Resolution: Duplicate > Fix the session passed to foreachBatch. > > > Key: SPARK-44462 > URL: https://issues.apache.org/jira/browse/SPARK-44462 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.4.1 >Reporter: Raghu Angadi >Priority: Major > > foreachBatch() in Connect uses initial session used while starting the > streaming query. But streaming query uses a cloned session, not the the > original session. We should set up the mapping for the cloned session and > pass that in. Look for this ticket ID in the code for more context inline. > > Another issue with not creating new session id: foreachBatch worker keeps the > session alive. The session mapping at Connect server does not expire and > query keeps running even if the original client disappears. This keeps the > query running. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39771) If spark.default.parallelism is unset, RDD defaultPartitioner may pick a value that is too large to successfully run
[ https://issues.apache.org/jira/browse/SPARK-39771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-39771: --- Labels: pull-request-available (was: ) > If spark.default.parallelism is unset, RDD defaultPartitioner may pick a > value that is too large to successfully run > > > Key: SPARK-39771 > URL: https://issues.apache.org/jira/browse/SPARK-39771 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Josh Rosen >Priority: Major > Labels: pull-request-available > > [According to its > docs|https://github.com/apache/spark/blob/899f6c90eb2de5b46a36710a131d7417010ce4b3/core/src/main/scala/org/apache/spark/Partitioner.scala#L45-L65], > {{Partitioner.defaultPartitioner}} will use the maximum number of RDD > partitions as its partition count when {{spark.default.parallelism}} is not > set. If that number of upstream partitions is very large then this can result > in shuffles where {{{}numMappers * numReducers = numMappers^2{}}}, which can > cause various problems that prevent the job from successfully running. > To help users identify when they have run into this problem, I think we > should add warning logs to Spark. > As an example of the problem, let's say that I have an RDD with 100,000 > partitions and then do a {{reduceByKey}} on it without specifying an explicit > partitioner or partition count. In this case, Spark will plan a reduce stage > with 100,000 partitions: > {code:java} > scala> sc.parallelize(1 to 10, 10).map(x => (x, x)).reduceByKey(_ + > _).toDebugString > res7: String = > (10) ShuffledRDD[21] at reduceByKey at :25 [] >+-(10) MapPartitionsRDD[20] at map at :25 [] > | ParallelCollectionRDD[19] at parallelize at :25 [] > {code} > This results in the creation of 10 billion shuffle blocks, so if this job > _does_ run it is likely to be extremely show. However, it's more likely that > the driver will crash when serializing map output statuses: if we were able > to use one bit per mapper / reducer pair (which is probably overly optimistic > in terms of compressibility) then the map statuses would be ~1.25 gigabytes > (and the actual size is probably much larger)! > I don't think that users are likely to intentionally wind up in this > scenario: it's more likely that either (a) their job depends on > {{spark.default.parallelism}} being set but it was run on an environment > lacking a value for that config, or (b) their input data significantly grew > in size. These scenarios may be rare, but they can be frustrating to debug > (especially if a failure occurs midway through a long-running job). > I think we should do something to handle this scenario. > A good starting point might be for {{Partitioner.defaultPartitioner}} to log > a warning when the default partition size exceeds some threshold. > In addition, I think it might be a good idea to log a similar warning in > {{MapOutputTrackerMaster}} right before we start trying to serialize map > statuses: in a real-world situation where this problem cropped up, the map > stage ran successfully but the driver crashed when serializing map statuses. > Putting a warning about partition counts here makes it more likely that users > will spot that error in the logs and be able to identify the source of the > problem (compared to a warning that appears much earlier in the job and > therefore much farther from the likely site of a crash). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47175) Remove ZOOKEEPER-1844 comment from KafkaTestUtils
[ https://issues.apache.org/jira/browse/SPARK-47175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47175: --- Labels: pull-request-available (was: ) > Remove ZOOKEEPER-1844 comment from KafkaTestUtils > - > > Key: SPARK-47175 > URL: https://issues.apache.org/jira/browse/SPARK-47175 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47175) Remove ZOOKEEPER-1844 comment from KafkaTestUtils
Dongjoon Hyun created SPARK-47175: - Summary: Remove ZOOKEEPER-1844 comment from KafkaTestUtils Key: SPARK-47175 URL: https://issues.apache.org/jira/browse/SPARK-47175 Project: Spark Issue Type: Sub-task Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns
[ https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin reassigned SPARK-47079: - Assignee: Desmond Cheong > Unable to create PySpark dataframe containing Variant columns > - > > Key: SPARK-47079 > URL: https://issues.apache.org/jira/browse/SPARK-47079 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Desmond Cheong >Assignee: Desmond Cheong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Trying to create a dataframe containing a variant type results in: > AssertionError: Undefined error message parameter for error class: > CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message > parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: > {'error': 'variant'} > "} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns
[ https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-47079. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45131 [https://github.com/apache/spark/pull/45131] > Unable to create PySpark dataframe containing Variant columns > - > > Key: SPARK-47079 > URL: https://issues.apache.org/jira/browse/SPARK-47079 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Desmond Cheong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Trying to create a dataframe containing a variant type results in: > AssertionError: Undefined error message parameter for error class: > CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message > parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: > {'error': 'variant'} > "} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47063) CAST long to timestamp has different behavior for codegen vs interpreted
[ https://issues.apache.org/jira/browse/SPARK-47063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820877#comment-17820877 ] Robert Joseph Evans commented on SPARK-47063: - [~planga82] I was not planning on putting up a patch, but it would be willing to if no one else wants to put one up. I would just need to know if we want to clamp the result or if we are okay with the overflow. > CAST long to timestamp has different behavior for codegen vs interpreted > > > Key: SPARK-47063 > URL: https://issues.apache.org/jira/browse/SPARK-47063 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2 >Reporter: Robert Joseph Evans >Priority: Major > > It probably impacts a lot more versions of the code than this, but I verified > it on 3.4.2. This also appears to be related to > https://issues.apache.org/jira/browse/SPARK-39209 > {code:java} > scala> Seq(Long.MaxValue, Long.MinValue, 0L, 1990L).toDF("v").selectExpr("*", > "CAST(v AS timestamp) as ts").selectExpr("*", "unix_micros(ts)").show(false) > ++-++ > |v |ts |unix_micros(ts) | > ++-++ > |9223372036854775807 |+294247-01-10 04:00:54.775807|9223372036854775807 | > |-9223372036854775808|-290308-12-21 19:59:05.224192|-9223372036854775808| > |0 |1970-01-01 00:00:00 |0 | > |1990 |1970-01-01 00:33:10 |199000 | > ++-++ > scala> Seq(Long.MaxValue, Long.MinValue, 0L, > 1990L).toDF("v").repartition(1).selectExpr("*", "CAST(v AS timestamp) as > ts").selectExpr("*", "unix_micros(ts)").show(false) > ++---+---+ > |v |ts |unix_micros(ts)| > ++---+---+ > |9223372036854775807 |1969-12-31 23:59:59|-100 | > |-9223372036854775808|1970-01-01 00:00:00|0 | > |0 |1970-01-01 00:00:00|0 | > |1990 |1970-01-01 00:33:10|199000 | > ++---+---+ > {code} > It looks like InMemoryTableScanExec is not doing code generation for the > expressions, but the ProjectExec after the repartition is. > If I disable code gen I get the same answer in both cases. > {code:java} > scala> spark.conf.set("spark.sql.codegen.wholeStage", false) > scala> spark.conf.set("spark.sql.codegen.factoryMode", "NO_CODEGEN") > scala> Seq(Long.MaxValue, Long.MinValue, 0L, 1990L).toDF("v").selectExpr("*", > "CAST(v AS timestamp) as ts").selectExpr("*", "unix_micros(ts)").show(false) > ++-++ > |v |ts |unix_micros(ts) | > ++-++ > |9223372036854775807 |+294247-01-10 04:00:54.775807|9223372036854775807 | > |-9223372036854775808|-290308-12-21 19:59:05.224192|-9223372036854775808| > |0 |1970-01-01 00:00:00 |0 | > |1990 |1970-01-01 00:33:10 |199000 | > ++-++ > scala> Seq(Long.MaxValue, Long.MinValue, 0L, > 1990L).toDF("v").repartition(1).selectExpr("*", "CAST(v AS timestamp) as > ts").selectExpr("*", "unix_micros(ts)").show(false) > ++-++ > |v |ts |unix_micros(ts) | > ++-++ > |9223372036854775807 |+294247-01-10 04:00:54.775807|9223372036854775807 | > |-9223372036854775808|-290308-12-21 19:59:05.224192|-9223372036854775808| > |0 |1970-01-01 00:00:00 |0 | > |1990 |1970-01-01 00:33:10 |199000 | > ++-++ > {code} > [https://github.com/apache/spark/blob/e2cd71a4cd54bbdf5af76d3edfbb2fc8c1b067b6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L1627] > Is the code used in codegen, but > [https://github.com/apache/spark/blob/e2cd71a4cd54bbdf5af76d3edfbb2fc8c1b067b6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L687] > is what is used outside of code gen. > Apparently `SECONDS.toMicros` truncates the value on an overflow, but the > codegen does not. > {code:java} > scala>
[jira] [Updated] (SPARK-33356) DAG Scheduler exhibits exponential runtime with PartitionerAwareUnion
[ https://issues.apache.org/jira/browse/SPARK-33356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Wells updated SPARK-33356: Reporter: Lucas Brutschy (was: Lucas Brutschy) > DAG Scheduler exhibits exponential runtime with PartitionerAwareUnion > - > > Key: SPARK-33356 > URL: https://issues.apache.org/jira/browse/SPARK-33356 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.2, 3.0.1 > Environment: Reproducible locally with 3.0.1, 2.4.2, and latest > master. >Reporter: Lucas Brutschy >Priority: Minor > > The current implementation of the {{DAGScheduler}} exhibits exponential > runtime in DAGs with many {{PartitionerAwareUnions}}. The reason seems to be > a mutual recursion between {{PartitionerAwareUnion.getPreferredLocations}} > and {{DAGScheduler.getPreferredLocs}}. > A minimal example reproducing the issue: > {code:scala} > object Example extends App { > val partitioner = new HashPartitioner(2) > val sc = new SparkContext(new > SparkConf().setAppName("").setMaster("local[*]")) > val rdd1 = sc.emptyRDD[(Int, Int)].partitionBy(partitioner) > val rdd2 = (1 to 30).map(_ => rdd1) > val rdd3 = rdd2.reduce(_ union _) > rdd3.collect() > } > {code} > The whole app should take around one second to complete, as no actual work is > done. However, it takes more time to submit the job than I am willing to wait. > The underlying cause appears to be mutual recursion between > {{PartitionerAwareUnion.getPreferredLocations}} and > {{DAGScheduler.getPreferredLocs}}, which restarts graph traversal at each > {{PartitionerAwareUnion}} with no memoization. Each node of the DAG is > visited {{O(n!)}} (exponentially many) times. > Note, that it is clear to me that you could use {{sc.union(rdd2)}} instead of > {{rdd2.reduce(_ union _)}} to eliminate the problem. I use this just to > demonstrate the issue in a sufficiently small example. Given a large DAG and > many PartitionerAwareUnions, especially contructed by iterative algorithms, > the problem can become relevant even without "abuse" of the union operation. > The exponential recursion in DAG Schedular was largely fixed with SPARK-682, > but in the special case of PartitionerAwareUnion, it is still possible. This > may actually be an underlying cause of SPARK-29181. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47156) SparkSession returns a null context during a dataset creation
[ https://issues.apache.org/jira/browse/SPARK-47156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Le Bihan resolved SPARK-47156. --- Resolution: Not A Bug I was lacking Spark knowledge, and learned that executor don't have context to give to anyone, at runtime. > SparkSession returns a null context during a dataset creation > - > > Key: SPARK-47156 > URL: https://issues.apache.org/jira/browse/SPARK-47156 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.2 > Environment: Debian 12 > Java 17 >Reporter: Marc Le Bihan >Priority: Major > > I need first to know if I'm in front of a bug or not. > If it's the case, I'll manage to create a test to help you reproduce the > case, but if it isn't, maybe Spark documentation could explain when > {{sparkSession.getContext()}} can return {{{}null{}}}. > > I'm willing to ease my development by separating : > * parquet files management \{ checking existence, then loading them as > cache, or saving data to them }, > * from dataset creation, when it doesn't exist yet, and should be > constituted from scratch. > > The method I'm using is this one: > {code:java} > {code:Java} > protected Dataset constitutionStandard(OptionsCreationLecture > optionsCreationLecture, >Supplier> worker, CacheParqueteur > cacheParqueteur) { >OptionsCreationLecture options = optionsCreationLecture != null ? > optionsCreationLecture : optionsCreationLecture(); >Dataset dataset = cacheParqueteur.call(options.useCache()); >return dataset == null ? > cacheParqueteur.save(cacheParqueteur.appliquer(worker.get())) : dataset; > } > {code} > In case the dataset doesn't exist in parquet files (= cache) yet, it starts > its creation by calling a {{worker.get()}} that is a {{Supplier}} of > {{{}Dataset{}}}. > > A concrete usage is this one: > {code:java} > {code:Java} > public Dataset rowEtablissements(OptionsCreationLecture > optionsCreationLecture, HistoriqueExecution historiqueExecution, int > anneeCOG, int anneeSIRENE, boolean actifsSeulement, boolean communesValides, > boolean nomenclaturesNAF2Valides) { >OptionsCreationLecture options = optionsCreationLecture != null ? > optionsCreationLecture : optionsCreationLecture(); >Supplier> worker = () -> { > super.setStageDescription(this.messageSource, > "row.etablissements.libelle.long", "row.etablissements.libelle.court", > anneeSIRENE, anneeCOG, actifsSeulement, communesValides, > nomenclaturesNAF2Valides); > > Map indexs = new HashMap<>(); > Dataset etablissements = > etablissementsNonFiltres(optionsCreationLecture, anneeSIRENE); > etablissements = etablissements.filter( > (FilterFunction)etablissement -> > this.validator.validationEtablissement(this.session, historiqueExecution, > etablissement, actifsSeulement, nomenclaturesNAF2Valides, indexs)); > // Si le filtrage par communes valides a été demandé, l'appliquer. > if (communesValides) { > etablissements = rowRestreindreAuxCommunesValides(etablissements, > anneeCOG, anneeSIRENE, indexs); > } > else { > etablissements = etablissements.withColumn("codeDepartement", > substring(CODE_COMMUNE.col(), 1, 2)); > } > // Associer les libellés des codes APE/NAF. > Dataset nomenclatureNAF = > this.nafDataset.rowNomenclatureNAF(anneeSIRENE); > etablissements = etablissements.join(nomenclatureNAF, > etablissements.col("activitePrincipale").equalTo(nomenclatureNAF.col("codeNAF")) > , "left_outer") > .drop("codeNAF", "niveauNAF"); > // Le dataset est maintenant considéré comme valide, et ses champs > peuvent être castés dans leurs types définitifs. > return this.validator.cast(etablissements); >}; >return constitutionStandard(options, () -> worker.get() > .withColumn("partitionSiren", SIREN_ENTREPRISE.col().substr(1,2)), > new CacheParqueteur<>(options, this.session, > "etablissements", > "annee_{0,number,#0}-actifs_{1}-communes_verifiees_{2}-nafs_verifies_{3}", > DEPARTEMENT_SIREN_SIRET, > anneeSIRENE, anneeCOG, actifsSeulement, communesValides)); > } {code} > > In the worker, a filter calls a {{validationEtablissement(SparkSession, > HistoriqueExecution, Row, ...)}} on each row to perform complete checking > (eight rules to check for an establishment validity). > When a check fails, along with a warning log, I'm also counting in > {{historiqueExecution}} object the number of problems of that kind I've > encountered. > That function increase a {{longAccumulator}} value, and create that > accumulator first, that it stores in a {{{}Map > accumulators{}}}, if needed. > {code:java} > {code:Java} > public void
[jira] [Commented] (SPARK-47063) CAST long to timestamp has different behavior for codegen vs interpreted
[ https://issues.apache.org/jira/browse/SPARK-47063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820853#comment-17820853 ] Pablo Langa Blanco commented on SPARK-47063: [~revans2] are you working on the fix? > CAST long to timestamp has different behavior for codegen vs interpreted > > > Key: SPARK-47063 > URL: https://issues.apache.org/jira/browse/SPARK-47063 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.2 >Reporter: Robert Joseph Evans >Priority: Major > > It probably impacts a lot more versions of the code than this, but I verified > it on 3.4.2. This also appears to be related to > https://issues.apache.org/jira/browse/SPARK-39209 > {code:java} > scala> Seq(Long.MaxValue, Long.MinValue, 0L, 1990L).toDF("v").selectExpr("*", > "CAST(v AS timestamp) as ts").selectExpr("*", "unix_micros(ts)").show(false) > ++-++ > |v |ts |unix_micros(ts) | > ++-++ > |9223372036854775807 |+294247-01-10 04:00:54.775807|9223372036854775807 | > |-9223372036854775808|-290308-12-21 19:59:05.224192|-9223372036854775808| > |0 |1970-01-01 00:00:00 |0 | > |1990 |1970-01-01 00:33:10 |199000 | > ++-++ > scala> Seq(Long.MaxValue, Long.MinValue, 0L, > 1990L).toDF("v").repartition(1).selectExpr("*", "CAST(v AS timestamp) as > ts").selectExpr("*", "unix_micros(ts)").show(false) > ++---+---+ > |v |ts |unix_micros(ts)| > ++---+---+ > |9223372036854775807 |1969-12-31 23:59:59|-100 | > |-9223372036854775808|1970-01-01 00:00:00|0 | > |0 |1970-01-01 00:00:00|0 | > |1990 |1970-01-01 00:33:10|199000 | > ++---+---+ > {code} > It looks like InMemoryTableScanExec is not doing code generation for the > expressions, but the ProjectExec after the repartition is. > If I disable code gen I get the same answer in both cases. > {code:java} > scala> spark.conf.set("spark.sql.codegen.wholeStage", false) > scala> spark.conf.set("spark.sql.codegen.factoryMode", "NO_CODEGEN") > scala> Seq(Long.MaxValue, Long.MinValue, 0L, 1990L).toDF("v").selectExpr("*", > "CAST(v AS timestamp) as ts").selectExpr("*", "unix_micros(ts)").show(false) > ++-++ > |v |ts |unix_micros(ts) | > ++-++ > |9223372036854775807 |+294247-01-10 04:00:54.775807|9223372036854775807 | > |-9223372036854775808|-290308-12-21 19:59:05.224192|-9223372036854775808| > |0 |1970-01-01 00:00:00 |0 | > |1990 |1970-01-01 00:33:10 |199000 | > ++-++ > scala> Seq(Long.MaxValue, Long.MinValue, 0L, > 1990L).toDF("v").repartition(1).selectExpr("*", "CAST(v AS timestamp) as > ts").selectExpr("*", "unix_micros(ts)").show(false) > ++-++ > |v |ts |unix_micros(ts) | > ++-++ > |9223372036854775807 |+294247-01-10 04:00:54.775807|9223372036854775807 | > |-9223372036854775808|-290308-12-21 19:59:05.224192|-9223372036854775808| > |0 |1970-01-01 00:00:00 |0 | > |1990 |1970-01-01 00:33:10 |199000 | > ++-++ > {code} > [https://github.com/apache/spark/blob/e2cd71a4cd54bbdf5af76d3edfbb2fc8c1b067b6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L1627] > Is the code used in codegen, but > [https://github.com/apache/spark/blob/e2cd71a4cd54bbdf5af76d3edfbb2fc8c1b067b6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L687] > is what is used outside of code gen. > Apparently `SECONDS.toMicros` truncates the value on an overflow, but the > codegen does not. > {code:java} > scala> Long.MaxValue > res11: Long = 9223372036854775807 > scala> java.util.concurrent.TimeUnit.SECONDS.toMicros(Long.MaxValue) > res12: Long = 9223372036854775807 > scala> Long.MaxValue
[jira] [Created] (SPARK-47174) Client Side Listener - Server side implementation
Wei Liu created SPARK-47174: --- Summary: Client Side Listener - Server side implementation Key: SPARK-47174 URL: https://issues.apache.org/jira/browse/SPARK-47174 Project: Spark Issue Type: Improvement Components: Connect, SS Affects Versions: 4.0.0 Reporter: Wei Liu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47174) Client Side Listener - Server side implementation
[ https://issues.apache.org/jira/browse/SPARK-47174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820828#comment-17820828 ] Wei Liu commented on SPARK-47174: - im working on this > Client Side Listener - Server side implementation > - > > Key: SPARK-47174 > URL: https://issues.apache.org/jira/browse/SPARK-47174 > Project: Spark > Issue Type: Improvement > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47173) fix typo in new streaming query listener explanation
[ https://issues.apache.org/jira/browse/SPARK-47173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820817#comment-17820817 ] Max Gekk commented on SPARK-47173: -- Resolved by https://github.com/apache/spark/pull/45263 > fix typo in new streaming query listener explanation > > > Key: SPARK-47173 > URL: https://issues.apache.org/jira/browse/SPARK-47173 > Project: Spark > Issue Type: Improvement > Components: SS, UI >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > > miss spelled > flatMapGroupsWithState with flatMapGroupWithState (missed a "s" after group) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47173) fix typo in new streaming query listener explanation
[ https://issues.apache.org/jira/browse/SPARK-47173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-47173: - Fix Version/s: 4.0.0 > fix typo in new streaming query listener explanation > > > Key: SPARK-47173 > URL: https://issues.apache.org/jira/browse/SPARK-47173 > Project: Spark > Issue Type: Improvement > Components: SS, UI >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > > miss spelled > flatMapGroupsWithState with flatMapGroupWithState (missed a "s" after group) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47173) fix typo in new streaming query listener explanation
[ https://issues.apache.org/jira/browse/SPARK-47173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-47173: Assignee: Wei Liu > fix typo in new streaming query listener explanation > > > Key: SPARK-47173 > URL: https://issues.apache.org/jira/browse/SPARK-47173 > Project: Spark > Issue Type: Improvement > Components: SS, UI >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Trivial > Labels: pull-request-available > > miss spelled > flatMapGroupsWithState with flatMapGroupWithState (missed a "s" after group) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46067) Upgrade commons-compress to 1.25.0
[ https://issues.apache.org/jira/browse/SPARK-46067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46067: -- Parent: SPARK-47046 Issue Type: Sub-task (was: Improvement) > Upgrade commons-compress to 1.25.0 > -- > > Key: SPARK-46067 > URL: https://issues.apache.org/jira/browse/SPARK-46067 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > https://commons.apache.org/proper/commons-compress/changes-report.html#a1.25.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47173) fix typo in new streaming query listener explanation
[ https://issues.apache.org/jira/browse/SPARK-47173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47173: --- Labels: pull-request-available (was: ) > fix typo in new streaming query listener explanation > > > Key: SPARK-47173 > URL: https://issues.apache.org/jira/browse/SPARK-47173 > Project: Spark > Issue Type: Improvement > Components: SS, UI >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Trivial > Labels: pull-request-available > > miss spelled > flatMapGroupsWithState with flatMapGroupWithState (missed a "s" after group) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47173) fix typo in new streaming query listener explanation
Wei Liu created SPARK-47173: --- Summary: fix typo in new streaming query listener explanation Key: SPARK-47173 URL: https://issues.apache.org/jira/browse/SPARK-47173 Project: Spark Issue Type: Improvement Components: SS, UI Affects Versions: 4.0.0 Reporter: Wei Liu miss spelled flatMapGroupsWithState with flatMapGroupWithState (missed a "s" after group) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47172) Upgrade Transport block cipher mode to GCM
[ https://issues.apache.org/jira/browse/SPARK-47172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-47172: - Shepherd: (was: Sean R. Owen) > Upgrade Transport block cipher mode to GCM > -- > > Key: SPARK-47172 > URL: https://issues.apache.org/jira/browse/SPARK-47172 > Project: Spark > Issue Type: Improvement > Components: Security >Affects Versions: 3.4.2, 3.5.0 >Reporter: Steve Weis >Priority: Minor > > The cipher transformation currently used for encrypting RPC calls is an > unauthenticated mode (AES/CTR/NoPadding). This needs to be upgraded to an > authenticated mode (AES/GCM/NoPadding) to prevent ciphertext from being > modified in transit. > The relevant line is here: > [https://github.com/apache/spark/blob/a939a7d0fd9c6b23c879cbee05275c6fbc939e38/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java#L220] > GCM is relatively more computationally expensive than CTR and adds a 16-byte > block of authentication tag data to each payload. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47172) Upgrade Transport block cipher mode to GCM
Steve Weis created SPARK-47172: -- Summary: Upgrade Transport block cipher mode to GCM Key: SPARK-47172 URL: https://issues.apache.org/jira/browse/SPARK-47172 Project: Spark Issue Type: Improvement Components: Security Affects Versions: 3.5.0, 3.4.2 Reporter: Steve Weis The cipher transformation currently used for encrypting RPC calls is an unauthenticated mode (AES/CTR/NoPadding). This needs to be upgraded to an authenticated mode (AES/GCM/NoPadding) to prevent ciphertext from being modified in transit. The relevant line is here: [https://github.com/apache/spark/blob/a939a7d0fd9c6b23c879cbee05275c6fbc939e38/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java#L220] GCM is relatively more computationally expensive than CTR and adds a 16-byte block of authentication tag data to each payload. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47171) Improve handling of new `exists` attributes within an aggregation
Anton Lykov created SPARK-47171: --- Summary: Improve handling of new `exists` attributes within an aggregation Key: SPARK-47171 URL: https://issues.apache.org/jira/browse/SPARK-47171 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 3.5.0 Reporter: Anton Lykov See PR comment for context: https://github.com/apache/spark/pull/45133#issuecomment-1949522246 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47170) Remove redundant scope identifier for `jakarta.servlet-api` and `javax.servlet-api`
[ https://issues.apache.org/jira/browse/SPARK-47170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47170. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45258 [https://github.com/apache/spark/pull/45258] > Remove redundant scope identifier for `jakarta.servlet-api` and > `javax.servlet-api` > --- > > Key: SPARK-47170 > URL: https://issues.apache.org/jira/browse/SPARK-47170 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: HiuFung Kwok >Assignee: HiuFung Kwok >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This is a follow-up ticket for SPRARK-47046 to remove the redundant `scope` > XML clause - compile. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47170) Remove redundant scope identifier for `jakarta.servlet-api` and `javax.servlet-api`
[ https://issues.apache.org/jira/browse/SPARK-47170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47170: - Assignee: HiuFung Kwok > Remove redundant scope identifier for `jakarta.servlet-api` and > `javax.servlet-api` > --- > > Key: SPARK-47170 > URL: https://issues.apache.org/jira/browse/SPARK-47170 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: HiuFung Kwok >Assignee: HiuFung Kwok >Priority: Major > Labels: pull-request-available > > This is a follow-up ticket for SPRARK-47046 to remove the redundant `scope` > XML clause - compile. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46077) Error in postgresql when pushing down filter by timestamp_ntz field
[ https://issues.apache.org/jira/browse/SPARK-46077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46077: --- Labels: pull-request-available (was: ) > Error in postgresql when pushing down filter by timestamp_ntz field > --- > > Key: SPARK-46077 > URL: https://issues.apache.org/jira/browse/SPARK-46077 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Marina Krasilnikova >Priority: Minor > Labels: pull-request-available > > code to reproduce: > SparkSession sparkSession = SparkSession > .builder() > .appName("test-app") > .master("local[*]") > .config("spark.sql.timestampType", "TIMESTAMP_NTZ") > .getOrCreate(); > String url = "..."; > String catalogPropPrefix = "spark.sql.catalog.myc"; > sparkSession.conf().set(catalogPropPrefix, JDBCTableCatalog.class.getName()); > sparkSession.conf().set(catalogPropPrefix + ".url", url); > Map options = new HashMap<>(); > options.put("driver", "org.postgresql.Driver"); > // options.put("pushDownPredicate", "false"); it works fine if this line is > uncommented > Dataset dataset = sparkSession.read() > .options(options) > .table("myc.demo.`My table`"); > dataset.createOrReplaceTempView("view1"); > String sql = "select * from view1 where `my date` = '2021-04-01 00:00:00'"; > Dataset result = sparkSession.sql(sql); > result.show(); > result.printSchema(); > Field `my date` is of type timestamp. This code results in > org.postgresql.util.PSQLException syntax error > > > String sql = "select * from view1 where `my date` = to_timestamp('2021-04-01 > 00:00:00', '-MM-dd HH:mm:ss')"; // this query also doesn't work > String sql = "select * from view1 where `my date` = date_trunc('DAY', > to_timestamp('2021-04-01 00:00:00', '-MM-dd HH:mm:ss'))"; // but this is > OK > > Is it a bug or I got something wrong? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47169) Disable bucketing on collated collumns
[ https://issues.apache.org/jira/browse/SPARK-47169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47169: --- Labels: pull-request-available (was: ) > Disable bucketing on collated collumns > -- > > Key: SPARK-47169 > URL: https://issues.apache.org/jira/browse/SPARK-47169 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-46992) Inconsistent results with 'sort', 'cache', and AQE.
[ https://issues.apache.org/jira/browse/SPARK-46992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820705#comment-17820705 ] Denis Tarima edited comment on SPARK-46992 at 2/26/24 1:44 PM: --- I think the root problem is that {{{}cache{}}}/{{{}persist{}}} changes the result. It might be a necessary performance trade-off, but if it's possible to keep the same result then the problem will disappear. {{CacheManager}} is shared between sessions so {{{}persist{}}}/{{{}unpersist{}}} affects all {{Dataset}} instances immediately creating a possibility of inconsistent results. For example, thread 1 calls {{{}df.count(){}}}, thread 2 calls {{{}df.cache().count(){}}}, and finally thread 1 calls {{df.count()}} again - thread 1 may get different counts. If fixing the root problem is infeasible then the secondary problem needs to be addressed: {{queryExecution.executedPlan}} is cached ({{{}lazy val{}}}) in {{Dataset}} instance, but it's not used by all queries in the same way causing inconsistency. - {{df}} and {{dfCached = df.cache()}} could have different logical plans so {{df}} wouldn't use cached data, but this change would create a backward incompatibility - {{Dataset}} could verify if it's cached in {{CacheManager}} on each access to {{queryExecution}} and use/keep another {{queryExecution}} instance when it's in a "cached" state. was (Author: dtarima): I think the root problem is that {{{}cache{}}}/{{{}persist{}}} changes the result. It might be a necessary performance trade-off, but if it's possible to keep the same result then the problem will disappear. {{CacheManager}} is shared between sessions so {{{}persist{}}}/{{{}unpersist{}}} affects all {{Dataset}} instances immediately creating a possibility of inconsistent results. For example, thread 1 calls {{{}df.count(){}}}, thread 2 calls {{{}df.cache().count(){}}}, and finally thread 1 calls {{df.count()}} again - thread 1 may get different counts. If fixing the root problem is infeasible then the secondary problem needs to be addressed: {{queryExecution.executedPlan}} is cached ({{{}lazy val{}}}) in {{Dataset}} instance, but it's not used by all queries causing inconsistency. - {{df}} and {{dfCached = df.cache()}} could have different logical plans so {{df}} wouldn't use cached data, but this change would create a backward incompatibility - {{Dataset}} could verify if it's cached in {{CacheManager}} on each access to {{queryExecution}} and use/keep another {{queryExecution}} instance when it's in a "cached" state. > Inconsistent results with 'sort', 'cache', and AQE. > --- > > Key: SPARK-46992 > URL: https://issues.apache.org/jira/browse/SPARK-46992 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.5.0 >Reporter: Denis Tarima >Priority: Critical > Labels: correctness, pull-request-available > > > With AQE enabled, having {color:#4c9aff}sort{color} in the plan changes > {color:#4c9aff}sample{color} results after caching. > Moreover, when cached, {color:#4c9aff}collect{color} returns records as if > it's not cached, which is inconsistent with {color:#4c9aff}count{color} and > {color:#4c9aff}show{color}. > A script to reproduce: > {code:scala} > import spark.implicits._ > val df = (1 to 4).toDF("id").sort("id").sample(0.4, 123) > println("NON CACHED:") > println(" count: " + df.count()) > println(" collect: " + df.collect().mkString(" ")) > df.show() > println("CACHED:") > df.cache().count() > println(" count: " + df.count()) > println(" collect: " + df.collect().mkString(" ")) > df.show() > df.unpersist() > {code} > output: > {code:java} > NON CACHED: > count: 2 > collect: [1] [4] > +---+ > | id| > +---+ > | 1| > | 4| > +---+ > CACHED: > count: 3 > collect: [1] [4] > +---+ > | id| > +---+ > | 1| > | 2| > | 3| > +---+ > {code} > BTW, disabling AQE > [{color:#4c9aff}spark.conf.set("spark.databricks.optimizer.adaptive.enabled", > "false"){color}] helps on Databricks clusters, but locally it has no effect, > at least on Spark 3.3.2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47009) Create table with collation
[ https://issues.apache.org/jira/browse/SPARK-47009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47009. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45105 [https://github.com/apache/spark/pull/45105] > Create table with collation > --- > > Key: SPARK-47009 > URL: https://issues.apache.org/jira/browse/SPARK-47009 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add support for creating table with columns containing non-default collated > data -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47009) Create table with collation
[ https://issues.apache.org/jira/browse/SPARK-47009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47009: --- Assignee: Stefan Kandic > Create table with collation > --- > > Key: SPARK-47009 > URL: https://issues.apache.org/jira/browse/SPARK-47009 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > > Add support for creating table with columns containing non-default collated > data -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46992) Inconsistent results with 'sort', 'cache', and AQE.
[ https://issues.apache.org/jira/browse/SPARK-46992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820705#comment-17820705 ] Denis Tarima commented on SPARK-46992: -- I think the root problem is that {{{}cache{}}}/{{{}persist{}}} changes the result. It might be a necessary performance trade-off, but if it's possible to keep the same result then the problem will disappear. {{CacheManager}} is shared between sessions so {{{}persist{}}}/{{{}unpersist{}}} affects all {{Dataset}} instances immediately creating a possibility of inconsistent results. For example, thread 1 calls {{{}df.count(){}}}, thread 2 calls {{{}df.cache().count(){}}}, and finally thread 1 calls {{df.count()}} again - thread 1 may get different counts. If fixing the root problem is infeasible then the secondary problem needs to be addressed: {{queryExecution.executedPlan}} is cached ({{{}lazy val{}}}) in {{Dataset}} instance, but it's not used by all queries causing inconsistency. - {{df}} and {{dfCached = df.cache()}} could have different logical plans so {{df}} wouldn't use cached data, but this change would create a backward incompatibility - {{Dataset}} could verify if it's cached in {{CacheManager}} on each access to {{queryExecution}} and use/keep another {{queryExecution}} instance when it's in a "cached" state. > Inconsistent results with 'sort', 'cache', and AQE. > --- > > Key: SPARK-46992 > URL: https://issues.apache.org/jira/browse/SPARK-46992 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.5.0 >Reporter: Denis Tarima >Priority: Critical > Labels: correctness, pull-request-available > > > With AQE enabled, having {color:#4c9aff}sort{color} in the plan changes > {color:#4c9aff}sample{color} results after caching. > Moreover, when cached, {color:#4c9aff}collect{color} returns records as if > it's not cached, which is inconsistent with {color:#4c9aff}count{color} and > {color:#4c9aff}show{color}. > A script to reproduce: > {code:scala} > import spark.implicits._ > val df = (1 to 4).toDF("id").sort("id").sample(0.4, 123) > println("NON CACHED:") > println(" count: " + df.count()) > println(" collect: " + df.collect().mkString(" ")) > df.show() > println("CACHED:") > df.cache().count() > println(" count: " + df.count()) > println(" collect: " + df.collect().mkString(" ")) > df.show() > df.unpersist() > {code} > output: > {code:java} > NON CACHED: > count: 2 > collect: [1] [4] > +---+ > | id| > +---+ > | 1| > | 4| > +---+ > CACHED: > count: 3 > collect: [1] [4] > +---+ > | id| > +---+ > | 1| > | 2| > | 3| > +---+ > {code} > BTW, disabling AQE > [{color:#4c9aff}spark.conf.set("spark.databricks.optimizer.adaptive.enabled", > "false"){color}] helps on Databricks clusters, but locally it has no effect, > at least on Spark 3.3.2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47170) Remove redundant scope identifier for `jakarta.servlet-api` and `javax.servlet-api`
[ https://issues.apache.org/jira/browse/SPARK-47170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47170: --- Labels: pull-request-available (was: ) > Remove redundant scope identifier for `jakarta.servlet-api` and > `javax.servlet-api` > --- > > Key: SPARK-47170 > URL: https://issues.apache.org/jira/browse/SPARK-47170 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: HiuFung Kwok >Priority: Major > Labels: pull-request-available > > This is a follow-up ticket for SPRARK-47046 to remove the redundant `scope` > XML clause - compile. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47170) Remove redundant scope identifier for `jakarta.servlet-api` and `javax.servlet-api`
[ https://issues.apache.org/jira/browse/SPARK-47170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820691#comment-17820691 ] Nikita Awasthi commented on SPARK-47170: User 'HiuKwok' has created a pull request for this issue: https://github.com/apache/spark/pull/45258 > Remove redundant scope identifier for `jakarta.servlet-api` and > `javax.servlet-api` > --- > > Key: SPARK-47170 > URL: https://issues.apache.org/jira/browse/SPARK-47170 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: HiuFung Kwok >Priority: Major > > This is a follow-up ticket for SPRARK-47046 to remove the redundant `scope` > XML clause - compile. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47145) Provide table identifier to scan node when DS v2 strategy is applied
[ https://issues.apache.org/jira/browse/SPARK-47145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820690#comment-17820690 ] Uros Stankovic commented on SPARK-47145: PR for this change https://github.com/apache/spark/pull/45200 > Provide table identifier to scan node when DS v2 strategy is applied > > > Key: SPARK-47145 > URL: https://issues.apache.org/jira/browse/SPARK-47145 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Uros Stankovic >Priority: Minor > > Currently, DataSourceScanExec node can accept table identifier, and that > information can be useful for later logging, debugging, etc, but > DataSourceV2Strategy does not provide that information to scan node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47170) Remove redundant scope identifier for `jakarta.servlet-api` and `javax.servlet-api`
HiuFung Kwok created SPARK-47170: Summary: Remove redundant scope identifier for `jakarta.servlet-api` and `javax.servlet-api` Key: SPARK-47170 URL: https://issues.apache.org/jira/browse/SPARK-47170 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 4.0.0 Reporter: HiuFung Kwok This is a follow-up ticket for SPRARK-47046 to remove the redundant `scope` XML clause - compile. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47169) Disable bucketint on collated collumns
Mihailo Milosevic created SPARK-47169: - Summary: Disable bucketint on collated collumns Key: SPARK-47169 URL: https://issues.apache.org/jira/browse/SPARK-47169 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47169) Disable bucketing on collated collumns
[ https://issues.apache.org/jira/browse/SPARK-47169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47169: -- Summary: Disable bucketing on collated collumns (was: Disable bucketint on collated collumns) > Disable bucketing on collated collumns > -- > > Key: SPARK-47169 > URL: https://issues.apache.org/jira/browse/SPARK-47169 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47147) Fix Pyspark collated string conversion error
[ https://issues.apache.org/jira/browse/SPARK-47147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47147: --- Labels: pull-request-available (was: ) > Fix Pyspark collated string conversion error > > > Key: SPARK-47147 > URL: https://issues.apache.org/jira/browse/SPARK-47147 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Nikola Mandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When running Pyspark shell in non-Spark Connect mode, query "SELECT 'abc' > COLLATE 'UCS_BASIC_LCASE'" produces the following error: > {code:java} > AssertionError: Undefined error message parameter for error class: > CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message > parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: {'error': > 'string(UCS_BASIC_LCASE)'}"} > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47168) Disable parquet filter pushdown for non default collated strings
[ https://issues.apache.org/jira/browse/SPARK-47168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Kandic updated SPARK-47168: -- Summary: Disable parquet filter pushdown for non default collated strings (was: Disable filter pushdown for non default collated strings) > Disable parquet filter pushdown for non default collated strings > > > Key: SPARK-47168 > URL: https://issues.apache.org/jira/browse/SPARK-47168 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47168) Disable filter pushdown for non default collated strings
Stefan Kandic created SPARK-47168: - Summary: Disable filter pushdown for non default collated strings Key: SPARK-47168 URL: https://issues.apache.org/jira/browse/SPARK-47168 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Stefan Kandic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47033) EXECUTE IMMEDIATE USING does not recognize session variable names
[ https://issues.apache.org/jira/browse/SPARK-47033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820680#comment-17820680 ] A G commented on SPARK-47033: - I want to work on this! > EXECUTE IMMEDIATE USING does not recognize session variable names > - > > Key: SPARK-47033 > URL: https://issues.apache.org/jira/browse/SPARK-47033 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Priority: Major > > {noformat} > DECLARE parm = 'Hello'; > EXECUTE IMMEDIATE 'SELECT :parm' USING parm; > [ALL_PARAMETERS_MUST_BE_NAMED] Using name parameterized queries requires all > parameters to be named. Parameters missing names: "parm". SQLSTATE: 07001 > EXECUTE IMMEDIATE 'SELECT :parm' USING parm AS parm; > Hello > {noformat} > variables are like column references, they act as their own aliases and thus > should not be required to be named to associate with a named parameter with > the same name. > Note that unlike for pySpark this should be case insensitive (haven't > verified). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47167) Add descriptive relation class
Uros Stankovic created SPARK-47167: -- Summary: Add descriptive relation class Key: SPARK-47167 URL: https://issues.apache.org/jira/browse/SPARK-47167 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.5.1 Reporter: Uros Stankovic BaseRelation class do not provide any descriptive information like name or description, etc. So it would be great to add such class so debugging and logging would be easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47165) Pull docker image only when its' absent
[ https://issues.apache.org/jira/browse/SPARK-47165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47165: --- Labels: pull-request-available (was: ) > Pull docker image only when its' absent > --- > > Key: SPARK-47165 > URL: https://issues.apache.org/jira/browse/SPARK-47165 > Project: Spark > Issue Type: Test > Components: Spark Docker >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47103) Make the default storage level of intermediate datasets for MLlib configurable
[ https://issues.apache.org/jira/browse/SPARK-47103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47103: -- Assignee: (was: Apache Spark) > Make the default storage level of intermediate datasets for MLlib configurable > -- > > Key: SPARK-47103 > URL: https://issues.apache.org/jira/browse/SPARK-47103 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47103) Make the default storage level of intermediate datasets for MLlib configurable
[ https://issues.apache.org/jira/browse/SPARK-47103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47103: -- Assignee: Apache Spark > Make the default storage level of intermediate datasets for MLlib configurable > -- > > Key: SPARK-47103 > URL: https://issues.apache.org/jira/browse/SPARK-47103 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47165) Pull docker image only when its' absent
[ https://issues.apache.org/jira/browse/SPARK-47165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47165: - Issue Type: Test (was: Improvement) > Pull docker image only when its' absent > --- > > Key: SPARK-47165 > URL: https://issues.apache.org/jira/browse/SPARK-47165 > Project: Spark > Issue Type: Test > Components: Spark Docker >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47165) Pull docker image only when its' absent
Kent Yao created SPARK-47165: Summary: Pull docker image only when its' absent Key: SPARK-47165 URL: https://issues.apache.org/jira/browse/SPARK-47165 Project: Spark Issue Type: Improvement Components: Spark Docker Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47158) Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231
[ https://issues.apache.org/jira/browse/SPARK-47158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47158: -- Assignee: (was: Apache Spark) > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 > - > > Key: SPARK-47158 > URL: https://issues.apache.org/jira/browse/SPARK-47158 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47158) Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231
[ https://issues.apache.org/jira/browse/SPARK-47158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47158: -- Assignee: Apache Spark > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 > - > > Key: SPARK-47158 > URL: https://issues.apache.org/jira/browse/SPARK-47158 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45599) Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset
[ https://issues.apache.org/jira/browse/SPARK-45599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-45599: - Fix Version/s: 3.5.2 (was: 3.5.1) > Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset > -- > > Key: SPARK-45599 > URL: https://issues.apache.org/jira/browse/SPARK-45599 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.6.3, 3.3.0, 3.2.3, 3.5.0 >Reporter: Robert Joseph Evans >Assignee: Nicholas Chammas >Priority: Critical > Labels: correctness, pull-request-available > Fix For: 4.0.0, 3.5.2 > > > I think this actually impacts all versions that have ever supported > percentile and it may impact other things because the bug is in OpenHashMap. > > I am really surprised that we caught this bug because everything has to hit > just wrong to make it happen. in python/pyspark if you run > > {code:python} > from math import * > from pyspark.sql.types import * > data = [(1.779652973678931e+173,), (9.247723870123388e-295,), > (5.891823952773268e+98,), (inf,), (1.9042708096454302e+195,), > (-3.085825028509117e+74,), (-1.9569489404314425e+128,), > (2.0738138203216883e+201,), (inf,), (2.5212410617263588e-282,), > (-2.646144697462316e-35,), (-3.468683249247593e-196,), (nan,), (None,), > (nan,), (1.822129180806602e-245,), (5.211702553315461e-259,), (-1.0,), > (-5.682293414619055e+46,), (-4.585039307326895e+166,), > (-5.936844510098297e-82,), (-5234708055733.116,), (4920675036.053339,), > (None,), (4.4501477170144023e-308,), (2.176024662699802e-210,), > (-5.046677974902737e+132,), (-5.490780063080251e-09,), > (1.703824427218836e-55,), (-1.1961155424160076e+102,), > (1.4403274475565667e+41,), (None,), (5.4470705929955455e-86,), > (5.120795466142678e-215,), (-9.01991342808203e+282,), > (4.051866849943636e-254,), (-3588518231990.927,), (-1.8891559842111865e+63,), > (3.4543959813437507e-304,), (-7.590734560275502e-63,), > (9.376528689861087e+117,), (-2.1696969883753554e-292,), > (7.227411393136537e+206,), (-2.428999624265911e-293,), > (5.741383583382542e-14,), (-1.4882040107841963e+286,), > (2.1973064836362255e-159,), (0.028096279323357867,), > (8.475809563703283e-64,), (3.002803065141241e-139,), > (-1.1041009815645263e+203,), (1.8461539468514548e-225,), > (-5.620339412794757e-251,), (3.5103766991437114e-60,), > (2.4925669515657655e+165,), (3.217759099462207e+108,), > (-8.796717685143486e+203,), (2.037360925124577e+292,), > (-6.542279108216022e+206,), (-7.951172614280046e-74,), > (6.226527569272003e+152,), (-5.673977270111637e-84,), > (-1.0186016078084965e-281,), (1.7976931348623157e+308,), > (4.205809391029644e+137,), (-9.871721037428167e+119,), (None,), > (-1.6663254121185628e-256,), (1.0075153091760986e-236,), (-0.0,), (0.0,), > (1.7976931348623157e+308,), (4.3214483342777574e-117,), > (-7.973642629411105e-89,), (-1.1028137694801181e-297,), > (2.9000325280299273e-39,), (-1.077534929323113e-264,), > (-1.1847952892216515e+137,), (nan,), (7.849390806334983e+226,), > (-1.831402251805194e+65,), (-2.664533698035492e+203,), > (-2.2385155698231885e+285,), (-2.3016388448634844e-155,), > (-9.607772864590422e+217,), (3.437191836077251e+209,), > (1.9846569552093057e-137,), (-3.010452936419635e-233,), > (1.4309793775440402e-87,), (-2.9383643865423363e-103,), > (-4.696878567317712e-162,), (8.391630779050713e-135,), (nan,), > (-3.3885098786542755e-128,), (-4.5154178008513483e-122,), (nan,), (nan,), > (2.187766760184779e+306,), (7.679268835670585e+223,), > (6.3131466321042515e+153,), (1.779652973678931e+173,), > (9.247723870123388e-295,), (5.891823952773268e+98,), (inf,), > (1.9042708096454302e+195,), (-3.085825028509117e+74,), > (-1.9569489404314425e+128,), (2.0738138203216883e+201,), (inf,), > (2.5212410617263588e-282,), (-2.646144697462316e-35,), > (-3.468683249247593e-196,), (nan,), (None,), (nan,), > (1.822129180806602e-245,), (5.211702553315461e-259,), (-1.0,), > (-5.682293414619055e+46,), (-4.585039307326895e+166,), > (-5.936844510098297e-82,), (-5234708055733.116,), (4920675036.053339,), > (None,), (4.4501477170144023e-308,), (2.176024662699802e-210,), > (-5.046677974902737e+132,), (-5.490780063080251e-09,), > (1.703824427218836e-55,), (-1.1961155424160076e+102,), > (1.4403274475565667e+41,), (None,), (5.4470705929955455e-86,), > (5.120795466142678e-215,), (-9.01991342808203e+282,), > (4.051866849943636e-254,), (-3588518231990.927,), (-1.8891559842111865e+63,), > (3.4543959813437507e-304,), (-7.590734560275502e-63,), > (9.376528689861087e+117,), (-2.1696969883753554e-292,), > (7.227411393136537e+206,), (-2.428999624265911e-293,), > (5.741383583382542e-14,), (-1.4882040107841963e+286,), >
[jira] [Assigned] (SPARK-45599) Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset
[ https://issues.apache.org/jira/browse/SPARK-45599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-45599: --- Assignee: Nicholas Chammas > Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset > -- > > Key: SPARK-45599 > URL: https://issues.apache.org/jira/browse/SPARK-45599 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.6.3, 3.3.0, 3.2.3, 3.5.0 >Reporter: Robert Joseph Evans >Assignee: Nicholas Chammas >Priority: Critical > Labels: correctness, pull-request-available > > I think this actually impacts all versions that have ever supported > percentile and it may impact other things because the bug is in OpenHashMap. > > I am really surprised that we caught this bug because everything has to hit > just wrong to make it happen. in python/pyspark if you run > > {code:python} > from math import * > from pyspark.sql.types import * > data = [(1.779652973678931e+173,), (9.247723870123388e-295,), > (5.891823952773268e+98,), (inf,), (1.9042708096454302e+195,), > (-3.085825028509117e+74,), (-1.9569489404314425e+128,), > (2.0738138203216883e+201,), (inf,), (2.5212410617263588e-282,), > (-2.646144697462316e-35,), (-3.468683249247593e-196,), (nan,), (None,), > (nan,), (1.822129180806602e-245,), (5.211702553315461e-259,), (-1.0,), > (-5.682293414619055e+46,), (-4.585039307326895e+166,), > (-5.936844510098297e-82,), (-5234708055733.116,), (4920675036.053339,), > (None,), (4.4501477170144023e-308,), (2.176024662699802e-210,), > (-5.046677974902737e+132,), (-5.490780063080251e-09,), > (1.703824427218836e-55,), (-1.1961155424160076e+102,), > (1.4403274475565667e+41,), (None,), (5.4470705929955455e-86,), > (5.120795466142678e-215,), (-9.01991342808203e+282,), > (4.051866849943636e-254,), (-3588518231990.927,), (-1.8891559842111865e+63,), > (3.4543959813437507e-304,), (-7.590734560275502e-63,), > (9.376528689861087e+117,), (-2.1696969883753554e-292,), > (7.227411393136537e+206,), (-2.428999624265911e-293,), > (5.741383583382542e-14,), (-1.4882040107841963e+286,), > (2.1973064836362255e-159,), (0.028096279323357867,), > (8.475809563703283e-64,), (3.002803065141241e-139,), > (-1.1041009815645263e+203,), (1.8461539468514548e-225,), > (-5.620339412794757e-251,), (3.5103766991437114e-60,), > (2.4925669515657655e+165,), (3.217759099462207e+108,), > (-8.796717685143486e+203,), (2.037360925124577e+292,), > (-6.542279108216022e+206,), (-7.951172614280046e-74,), > (6.226527569272003e+152,), (-5.673977270111637e-84,), > (-1.0186016078084965e-281,), (1.7976931348623157e+308,), > (4.205809391029644e+137,), (-9.871721037428167e+119,), (None,), > (-1.6663254121185628e-256,), (1.0075153091760986e-236,), (-0.0,), (0.0,), > (1.7976931348623157e+308,), (4.3214483342777574e-117,), > (-7.973642629411105e-89,), (-1.1028137694801181e-297,), > (2.9000325280299273e-39,), (-1.077534929323113e-264,), > (-1.1847952892216515e+137,), (nan,), (7.849390806334983e+226,), > (-1.831402251805194e+65,), (-2.664533698035492e+203,), > (-2.2385155698231885e+285,), (-2.3016388448634844e-155,), > (-9.607772864590422e+217,), (3.437191836077251e+209,), > (1.9846569552093057e-137,), (-3.010452936419635e-233,), > (1.4309793775440402e-87,), (-2.9383643865423363e-103,), > (-4.696878567317712e-162,), (8.391630779050713e-135,), (nan,), > (-3.3885098786542755e-128,), (-4.5154178008513483e-122,), (nan,), (nan,), > (2.187766760184779e+306,), (7.679268835670585e+223,), > (6.3131466321042515e+153,), (1.779652973678931e+173,), > (9.247723870123388e-295,), (5.891823952773268e+98,), (inf,), > (1.9042708096454302e+195,), (-3.085825028509117e+74,), > (-1.9569489404314425e+128,), (2.0738138203216883e+201,), (inf,), > (2.5212410617263588e-282,), (-2.646144697462316e-35,), > (-3.468683249247593e-196,), (nan,), (None,), (nan,), > (1.822129180806602e-245,), (5.211702553315461e-259,), (-1.0,), > (-5.682293414619055e+46,), (-4.585039307326895e+166,), > (-5.936844510098297e-82,), (-5234708055733.116,), (4920675036.053339,), > (None,), (4.4501477170144023e-308,), (2.176024662699802e-210,), > (-5.046677974902737e+132,), (-5.490780063080251e-09,), > (1.703824427218836e-55,), (-1.1961155424160076e+102,), > (1.4403274475565667e+41,), (None,), (5.4470705929955455e-86,), > (5.120795466142678e-215,), (-9.01991342808203e+282,), > (4.051866849943636e-254,), (-3588518231990.927,), (-1.8891559842111865e+63,), > (3.4543959813437507e-304,), (-7.590734560275502e-63,), > (9.376528689861087e+117,), (-2.1696969883753554e-292,), > (7.227411393136537e+206,), (-2.428999624265911e-293,), > (5.741383583382542e-14,), (-1.4882040107841963e+286,), > (2.1973064836362255e-159,), (0.028096279323357867,), >
[jira] [Resolved] (SPARK-45599) Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset
[ https://issues.apache.org/jira/browse/SPARK-45599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-45599. - Fix Version/s: 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 45036 [https://github.com/apache/spark/pull/45036] > Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset > -- > > Key: SPARK-45599 > URL: https://issues.apache.org/jira/browse/SPARK-45599 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.6.3, 3.3.0, 3.2.3, 3.5.0 >Reporter: Robert Joseph Evans >Assignee: Nicholas Chammas >Priority: Critical > Labels: correctness, pull-request-available > Fix For: 3.5.1, 4.0.0 > > > I think this actually impacts all versions that have ever supported > percentile and it may impact other things because the bug is in OpenHashMap. > > I am really surprised that we caught this bug because everything has to hit > just wrong to make it happen. in python/pyspark if you run > > {code:python} > from math import * > from pyspark.sql.types import * > data = [(1.779652973678931e+173,), (9.247723870123388e-295,), > (5.891823952773268e+98,), (inf,), (1.9042708096454302e+195,), > (-3.085825028509117e+74,), (-1.9569489404314425e+128,), > (2.0738138203216883e+201,), (inf,), (2.5212410617263588e-282,), > (-2.646144697462316e-35,), (-3.468683249247593e-196,), (nan,), (None,), > (nan,), (1.822129180806602e-245,), (5.211702553315461e-259,), (-1.0,), > (-5.682293414619055e+46,), (-4.585039307326895e+166,), > (-5.936844510098297e-82,), (-5234708055733.116,), (4920675036.053339,), > (None,), (4.4501477170144023e-308,), (2.176024662699802e-210,), > (-5.046677974902737e+132,), (-5.490780063080251e-09,), > (1.703824427218836e-55,), (-1.1961155424160076e+102,), > (1.4403274475565667e+41,), (None,), (5.4470705929955455e-86,), > (5.120795466142678e-215,), (-9.01991342808203e+282,), > (4.051866849943636e-254,), (-3588518231990.927,), (-1.8891559842111865e+63,), > (3.4543959813437507e-304,), (-7.590734560275502e-63,), > (9.376528689861087e+117,), (-2.1696969883753554e-292,), > (7.227411393136537e+206,), (-2.428999624265911e-293,), > (5.741383583382542e-14,), (-1.4882040107841963e+286,), > (2.1973064836362255e-159,), (0.028096279323357867,), > (8.475809563703283e-64,), (3.002803065141241e-139,), > (-1.1041009815645263e+203,), (1.8461539468514548e-225,), > (-5.620339412794757e-251,), (3.5103766991437114e-60,), > (2.4925669515657655e+165,), (3.217759099462207e+108,), > (-8.796717685143486e+203,), (2.037360925124577e+292,), > (-6.542279108216022e+206,), (-7.951172614280046e-74,), > (6.226527569272003e+152,), (-5.673977270111637e-84,), > (-1.0186016078084965e-281,), (1.7976931348623157e+308,), > (4.205809391029644e+137,), (-9.871721037428167e+119,), (None,), > (-1.6663254121185628e-256,), (1.0075153091760986e-236,), (-0.0,), (0.0,), > (1.7976931348623157e+308,), (4.3214483342777574e-117,), > (-7.973642629411105e-89,), (-1.1028137694801181e-297,), > (2.9000325280299273e-39,), (-1.077534929323113e-264,), > (-1.1847952892216515e+137,), (nan,), (7.849390806334983e+226,), > (-1.831402251805194e+65,), (-2.664533698035492e+203,), > (-2.2385155698231885e+285,), (-2.3016388448634844e-155,), > (-9.607772864590422e+217,), (3.437191836077251e+209,), > (1.9846569552093057e-137,), (-3.010452936419635e-233,), > (1.4309793775440402e-87,), (-2.9383643865423363e-103,), > (-4.696878567317712e-162,), (8.391630779050713e-135,), (nan,), > (-3.3885098786542755e-128,), (-4.5154178008513483e-122,), (nan,), (nan,), > (2.187766760184779e+306,), (7.679268835670585e+223,), > (6.3131466321042515e+153,), (1.779652973678931e+173,), > (9.247723870123388e-295,), (5.891823952773268e+98,), (inf,), > (1.9042708096454302e+195,), (-3.085825028509117e+74,), > (-1.9569489404314425e+128,), (2.0738138203216883e+201,), (inf,), > (2.5212410617263588e-282,), (-2.646144697462316e-35,), > (-3.468683249247593e-196,), (nan,), (None,), (nan,), > (1.822129180806602e-245,), (5.211702553315461e-259,), (-1.0,), > (-5.682293414619055e+46,), (-4.585039307326895e+166,), > (-5.936844510098297e-82,), (-5234708055733.116,), (4920675036.053339,), > (None,), (4.4501477170144023e-308,), (2.176024662699802e-210,), > (-5.046677974902737e+132,), (-5.490780063080251e-09,), > (1.703824427218836e-55,), (-1.1961155424160076e+102,), > (1.4403274475565667e+41,), (None,), (5.4470705929955455e-86,), > (5.120795466142678e-215,), (-9.01991342808203e+282,), > (4.051866849943636e-254,), (-3588518231990.927,), (-1.8891559842111865e+63,), > (3.4543959813437507e-304,), (-7.590734560275502e-63,), > (9.376528689861087e+117,), (-2.1696969883753554e-292,), >