[jira] [Updated] (SPARK-48302) Preserve nulls in map columns in PyArrow Tables
[ https://issues.apache.org/jira/browse/SPARK-48302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48302: --- Labels: pull-request-available (was: ) > Preserve nulls in map columns in PyArrow Tables > --- > > Key: SPARK-48302 > URL: https://issues.apache.org/jira/browse/SPARK-48302 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ian Cook >Priority: Major > Labels: pull-request-available > > Because of a limitation in PyArrow, when PyArrow Tables containing MapArray > columns with nested fields or timestamps are passed to > {{{}spark.createDataFrame(){}}}, null values in the MapArray columns are > replaced with empty lists. > The PySpark function where this happens is > {{{}pyspark.sql.pandas.types._check_arrow_array_timestamps_localize{}}}. > Also see [https://github.com/apache/arrow/issues/41684]. > See the skipped tests and the TODO mentioning SPARK-48302. > [Update] A fix for this has been implemented in PyArrow in > [https://github.com/apache/arrow/pull/41757] by adding a {{mask}} argument to > {{{}pa.MapArray.from_arrays{}}}. This will be released in PyArrow 17.0.0. > Since older versions of PyArrow (which PySpark will still support for a > while) won't have this argument, we will need to do a check like: > {{LooseVersion(pa.\_\_version\_\_) >= LooseVersion("17.0.0")}} > or > {{from inspect import signature}} > {{"mask" in signature(pa.MapArray.from_arrays).parameters}} > and only pass {{mask}} if that's true. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48499) Use Math.abs to get positive numbers
[ https://issues.apache.org/jira/browse/SPARK-48499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48499: --- Labels: pull-request-available (was: ) > Use Math.abs to get positive numbers > > > Key: SPARK-48499 > URL: https://issues.apache.org/jira/browse/SPARK-48499 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Junqing Li >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48497) Add user guide for batch data source write API
[ https://issues.apache.org/jira/browse/SPARK-48497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48497: --- Labels: pull-request-available (was: ) > Add user guide for batch data source write API > -- > > Key: SPARK-48497 > URL: https://issues.apache.org/jira/browse/SPARK-48497 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Add examples for batch data source write. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48466) Wrap empty relation propagation in a dedicated node
[ https://issues.apache.org/jira/browse/SPARK-48466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48466: --- Labels: pull-request-available (was: ) > Wrap empty relation propagation in a dedicated node > --- > > Key: SPARK-48466 > URL: https://issues.apache.org/jira/browse/SPARK-48466 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ziqi Liu >Priority: Major > Labels: pull-request-available > > Currently we replace with a LocalTableScan in case of empty relation > propagation, which lost the information about the original query plan and > make it less human readable. The idea is to create a dedicated > `EmptyRelation` node which is a lead node but wraps the original query plan > inside. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48491) Refactor HiveWindowFunctionQuerySuite in BeforeAll and AfterAll
[ https://issues.apache.org/jira/browse/SPARK-48491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48491: --- Labels: pull-request-available (was: ) > Refactor HiveWindowFunctionQuerySuite in BeforeAll and AfterAll > --- > > Key: SPARK-48491 > URL: https://issues.apache.org/jira/browse/SPARK-48491 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48494) Update airlift:aircompressor to 0.27
[ https://issues.apache.org/jira/browse/SPARK-48494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48494: --- Labels: pull-request-available (was: ) > Update airlift:aircompressor to 0.27 > > > Key: SPARK-48494 > URL: https://issues.apache.org/jira/browse/SPARK-48494 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 4.0.0 >Reporter: Bjørn Jørgensen >Priority: Major > Labels: pull-request-available > > [CVE-2024-36114|https://www.cve.org/CVERecord?id=CVE-2024-36114] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48493) Enhance Python Datasource Reader with Arrow Batch Support for Improved Performance
[ https://issues.apache.org/jira/browse/SPARK-48493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48493: --- Labels: pull-request-available (was: ) > Enhance Python Datasource Reader with Arrow Batch Support for Improved > Performance > -- > > Key: SPARK-48493 > URL: https://issues.apache.org/jira/browse/SPARK-48493 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Luca Canali >Priority: Minor > Labels: pull-request-available > > > This proposes an enhancement to the Python Datasource Reader by adding an > option to yield Arrow batches directly, significantly boosting performance > compared to using tuples or Rows. This implementation uses the existing work > with MapInArrow (see SPARK-46253 ). > In tests with a custom Python Datasource for High Energy Physics data (ROOT > format reader), using Arrow batches has demonstrated an 8x speed increase > over the traditional method of feeding data via tuples. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48490) Unescapes any literals for message of MessageWithContext
[ https://issues.apache.org/jira/browse/SPARK-48490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48490: --- Labels: pull-request-available (was: ) > Unescapes any literals for message of MessageWithContext > > > Key: SPARK-48490 > URL: https://issues.apache.org/jira/browse/SPARK-48490 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Critical > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48489) Throw an user-facing error when reading invalid schema from text DataSource
[ https://issues.apache.org/jira/browse/SPARK-48489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48489: --- Labels: pull-request-available (was: ) > Throw an user-facing error when reading invalid schema from text DataSource > --- > > Key: SPARK-48489 > URL: https://issues.apache.org/jira/browse/SPARK-48489 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.3 >Reporter: Stefan Bukorovic >Priority: Minor > Labels: pull-request-available > > Text DataSource produces table schema with only 1 column, but it is possible > to try and create a table with schema having multiple columns. > Currently, when user tries this, we have an assert in the code, which fails > and throws internal spark error. We should throw a better user-facing error. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47690) Hash aggregate support for strings with collation
[ https://issues.apache.org/jira/browse/SPARK-47690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47690: --- Labels: pull-request-available (was: ) > Hash aggregate support for strings with collation > - > > Key: SPARK-47690 > URL: https://issues.apache.org/jira/browse/SPARK-47690 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48488) Restore the original logic of methods `log[info|warning|error]` in `SparkSubmit`
[ https://issues.apache.org/jira/browse/SPARK-48488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48488: --- Labels: pull-request-available (was: ) > Restore the original logic of methods `log[info|warning|error]` in > `SparkSubmit` > > > Key: SPARK-48488 > URL: https://issues.apache.org/jira/browse/SPARK-48488 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Critical > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48487) Update License & Notice according to the dependency changes
[ https://issues.apache.org/jira/browse/SPARK-48487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48487: --- Labels: pull-request-available (was: ) > Update License & Notice according to the dependency changes > --- > > Key: SPARK-48487 > URL: https://issues.apache.org/jira/browse/SPARK-48487 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48476) NPE thrown when delimiter set to null in CSV
[ https://issues.apache.org/jira/browse/SPARK-48476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48476: -- Assignee: (was: Apache Spark) > NPE thrown when delimiter set to null in CSV > > > Key: SPARK-48476 > URL: https://issues.apache.org/jira/browse/SPARK-48476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Milan Stefanovic >Priority: Major > Labels: pull-request-available > > When customers specified delimiter to null, currently we throw NPE. We should > throw customer facing error > repro: > spark.read.format("csv") > .option("delimiter", null) > .load() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48476) NPE thrown when delimiter set to null in CSV
[ https://issues.apache.org/jira/browse/SPARK-48476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48476: -- Assignee: Apache Spark > NPE thrown when delimiter set to null in CSV > > > Key: SPARK-48476 > URL: https://issues.apache.org/jira/browse/SPARK-48476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Milan Stefanovic >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > When customers specified delimiter to null, currently we throw NPE. We should > throw customer facing error > repro: > spark.read.format("csv") > .option("delimiter", null) > .load() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47258) Assign error classes to SHOW CREATE TABLE errors
[ https://issues.apache.org/jira/browse/SPARK-47258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47258: -- Assignee: Apache Spark > Assign error classes to SHOW CREATE TABLE errors > > > Key: SPARK-47258 > URL: https://issues.apache.org/jira/browse/SPARK-47258 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_127[0-5]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47258) Assign error classes to SHOW CREATE TABLE errors
[ https://issues.apache.org/jira/browse/SPARK-47258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47258: -- Assignee: (was: Apache Spark) > Assign error classes to SHOW CREATE TABLE errors > > > Key: SPARK-47258 > URL: https://issues.apache.org/jira/browse/SPARK-47258 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_127[0-5]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48485) Support interruptTag and interruptAll in streaming queries
[ https://issues.apache.org/jira/browse/SPARK-48485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48485: --- Labels: pull-request-available (was: ) > Support interruptTag and interruptAll in streaming queries > -- > > Key: SPARK-48485 > URL: https://issues.apache.org/jira/browse/SPARK-48485 > Project: Spark > Issue Type: Improvement > Components: Connect, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Spark Connect's interrupt API does not interrupt streaming queries. We should > support them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38506) Push partial aggregation through join
[ https://issues.apache.org/jira/browse/SPARK-38506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-38506: --- Labels: pull-request-available (was: ) > Push partial aggregation through join > - > > Key: SPARK-38506 > URL: https://issues.apache.org/jira/browse/SPARK-38506 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > > Please see > https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Request-and-Transaction-Processing/Join-Planning-and-Optimization/Partial-GROUP-BY-Block-Optimization > for more details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48484) V2Write use the same TaskAttemptId for different task attempts
[ https://issues.apache.org/jira/browse/SPARK-48484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48484: --- Labels: pull-request-available (was: ) > V2Write use the same TaskAttemptId for different task attempts > -- > > Key: SPARK-48484 > URL: https://issues.apache.org/jira/browse/SPARK-48484 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
[ https://issues.apache.org/jira/browse/SPARK-48482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48482: --- Labels: pull-request-available (was: ) > dropDuplicates and dropDuplicatesWithinWatermark should accept varargs > -- > > Key: SPARK-48482 > URL: https://issues.apache.org/jira/browse/SPARK-48482 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48476) NPE thrown when delimiter set to null in CSV
[ https://issues.apache.org/jira/browse/SPARK-48476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48476: --- Labels: pull-request-available (was: ) > NPE thrown when delimiter set to null in CSV > > > Key: SPARK-48476 > URL: https://issues.apache.org/jira/browse/SPARK-48476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Milan Stefanovic >Priority: Major > Labels: pull-request-available > > When customers specified delimiter to null, currently we throw NPE. We should > throw customer facing error > repro: > spark.read.format("csv") > .option("delimiter", null) > .load() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48479) Support creating SQL functions in parser
[ https://issues.apache.org/jira/browse/SPARK-48479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48479: --- Labels: pull-request-available (was: ) > Support creating SQL functions in parser > > > Key: SPARK-48479 > URL: https://issues.apache.org/jira/browse/SPARK-48479 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Add Spark SQL parser for creating SQL functions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48465) Avoid no-op empty relation propagation in AQE
[ https://issues.apache.org/jira/browse/SPARK-48465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48465: --- Labels: pull-request-available (was: ) > Avoid no-op empty relation propagation in AQE > - > > Key: SPARK-48465 > URL: https://issues.apache.org/jira/browse/SPARK-48465 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ziqi Liu >Priority: Major > Labels: pull-request-available > > We should avoid no-op empty relation propagation in AQE: if we convert an > empty QueryStageExec to empty relation, it will further wrapped into a new > query stage and execute -> produce empty result -> empty relation propagation > again. This issue is currently not exposed because AQE will try to reuse > shuffle. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48477) Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite, HivePlanTest
[ https://issues.apache.org/jira/browse/SPARK-48477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48477: --- Labels: pull-request-available (was: ) > Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite, > HivePlanTest > > > Key: SPARK-48477 > URL: https://issues.apache.org/jira/browse/SPARK-48477 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
[ https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48474: --- Labels: pull-request-available (was: ) > Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit` > --- > > Key: SPARK-48474 > URL: https://issues.apache.org/jira/browse/SPARK-48474 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42252) Deprecate spark.shuffle.unsafe.file.output.buffer and add a new config
[ https://issues.apache.org/jira/browse/SPARK-42252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42252: --- Labels: pull-request-available (was: ) > Deprecate spark.shuffle.unsafe.file.output.buffer and add a new config > -- > > Key: SPARK-42252 > URL: https://issues.apache.org/jira/browse/SPARK-42252 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0 >Reporter: Wei Guo >Priority: Minor > Labels: pull-request-available > > After Jira SPARK-28209 and PR > [25007|[https://github.com/apache/spark/pull/25007]], the new shuffle writer > api is proposed. All shuffle writers(BypassMergeSortShuffleWriter, > SortShuffleWriter, UnsafeShuffleWriter) are based on > LocalDiskShuffleMapOutputWriter to write local disk shuffle files. The config > spark.shuffle.unsafe.file.output.buffer used in > LocalDiskShuffleMapOutputWriter was only used in UnsafeShuffleWriter before. > > It's better to rename it and make it more suitable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48318) Hash join support for strings with collation (complex types)
[ https://issues.apache.org/jira/browse/SPARK-48318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48318: -- Assignee: Apache Spark > Hash join support for strings with collation (complex types) > > > Key: SPARK-48318 > URL: https://issues.apache.org/jira/browse/SPARK-48318 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48318) Hash join support for strings with collation (complex types)
[ https://issues.apache.org/jira/browse/SPARK-48318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48318: -- Assignee: (was: Apache Spark) > Hash join support for strings with collation (complex types) > > > Key: SPARK-48318 > URL: https://issues.apache.org/jira/browse/SPARK-48318 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48471) Improve documentation and usage guide for history server
[ https://issues.apache.org/jira/browse/SPARK-48471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48471: -- Assignee: (was: Apache Spark) > Improve documentation and usage guide for history server > > > Key: SPARK-48471 > URL: https://issues.apache.org/jira/browse/SPARK-48471 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48471) Improve documentation and usage guide for history server
[ https://issues.apache.org/jira/browse/SPARK-48471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48471: -- Assignee: Apache Spark > Improve documentation and usage guide for history server > > > Key: SPARK-48471 > URL: https://issues.apache.org/jira/browse/SPARK-48471 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47260: -- Assignee: (was: Apache Spark) > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47260: -- Assignee: Apache Spark > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47260: -- Assignee: Apache Spark > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47260: -- Assignee: (was: Apache Spark) > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47260: -- Assignee: (was: Apache Spark) > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47260: -- Assignee: Apache Spark > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48471) Improve documentation and usage guide for history server
[ https://issues.apache.org/jira/browse/SPARK-48471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48471: --- Labels: pull-request-available (was: ) > Improve documentation and usage guide for history server > > > Key: SPARK-48471 > URL: https://issues.apache.org/jira/browse/SPARK-48471 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48280) Add Expression Walker for Testing
[ https://issues.apache.org/jira/browse/SPARK-48280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48280: --- Labels: pull-request-available (was: ) > Add Expression Walker for Testing > - > > Key: SPARK-48280 > URL: https://issues.apache.org/jira/browse/SPARK-48280 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48468) Add LogicalQueryStage interface in catalyst
[ https://issues.apache.org/jira/browse/SPARK-48468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48468: --- Labels: pull-request-available (was: ) > Add LogicalQueryStage interface in catalyst > --- > > Key: SPARK-48468 > URL: https://issues.apache.org/jira/browse/SPARK-48468 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ziqi Liu >Priority: Major > Labels: pull-request-available > > Add `LogicalQueryStage` interface in catalyst so that it's visible in logical > rules -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48467) Upgrade Maven to 3.9.7
[ https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48467: --- Labels: pull-request-available (was: ) > Upgrade Maven to 3.9.7 > -- > > Key: SPARK-48467 > URL: https://issues.apache.org/jira/browse/SPARK-48467 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
[ https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48446: --- Labels: easyfix pull-request-available (was: easyfix) > Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax > -- > > Key: SPARK-48446 > URL: https://issues.apache.org/jira/browse/SPARK-48446 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Yuchen Liu >Priority: Minor > Labels: easyfix, pull-request-available > Original Estimate: 1h > Remaining Estimate: 1h > > For dropDuplicates, the example on > [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22] > is out of date compared with > [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html]. > The argument should be a list. > The discrepancy is also true for dropDuplicatesWithinWatermark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48464) Refactor SQLConfSuite and StatisticsSuite
[ https://issues.apache.org/jira/browse/SPARK-48464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48464: --- Labels: pull-request-available (was: ) > Refactor SQLConfSuite and StatisticsSuite > - > > Key: SPARK-48464 > URL: https://issues.apache.org/jira/browse/SPARK-48464 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48461) Replace NullPointerExceptions with proper error classes in AssertNotNull expression
[ https://issues.apache.org/jira/browse/SPARK-48461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48461: --- Labels: pull-request-available (was: ) > Replace NullPointerExceptions with proper error classes in AssertNotNull > expression > --- > > Key: SPARK-48461 > URL: https://issues.apache.org/jira/browse/SPARK-48461 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Priority: Major > Labels: pull-request-available > > [Code location > here|https://github.com/apache/spark/blob/f5d9b809881552c0e1b5af72b2a32caa25018eb3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L1929] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48462) Refactor HiveQuerySuite.scala and HiveTableScanSuite
[ https://issues.apache.org/jira/browse/SPARK-48462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48462: --- Labels: pull-request-available (was: ) > Refactor HiveQuerySuite.scala and HiveTableScanSuite > > > Key: SPARK-48462 > URL: https://issues.apache.org/jira/browse/SPARK-48462 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48447) Check state store provider class before invoking the constructor
[ https://issues.apache.org/jira/browse/SPARK-48447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48447: --- Labels: pull-request-available (was: ) > Check state store provider class before invoking the constructor > > > Key: SPARK-48447 > URL: https://issues.apache.org/jira/browse/SPARK-48447 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Yuchen Liu >Priority: Major > Labels: pull-request-available > Original Estimate: 24h > Remaining Estimate: 24h > > We should restrict that only classes > [extending|https://github.com/databricks/runtime/blob/1440e77ab54c40981066c22ec759bdafc0683e76/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala#L73] > {{StateStoreProvider}} can be constructed to prevent customer from > instantiating arbitrary class of objects. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48459) Implement DataFrameQueryContext in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48459: --- Labels: pull-request-available (was: ) > Implement DataFrameQueryContext in Spark Connect > > > Key: SPARK-48459 > URL: https://issues.apache.org/jira/browse/SPARK-48459 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Implements the same https://github.com/apache/spark/pull/45377 in Spark > Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48454) Directly use the parent dataframe class
[ https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48454: -- Assignee: (was: Apache Spark) > Directly use the parent dataframe class > --- > > Key: SPARK-48454 > URL: https://issues.apache.org/jira/browse/SPARK-48454 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48454) Directly use the parent dataframe class
[ https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48454: -- Assignee: Apache Spark > Directly use the parent dataframe class > --- > > Key: SPARK-48454 > URL: https://issues.apache.org/jira/browse/SPARK-48454 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48454) Directly use the parent dataframe class
[ https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48454: -- Assignee: Apache Spark > Directly use the parent dataframe class > --- > > Key: SPARK-48454 > URL: https://issues.apache.org/jira/browse/SPARK-48454 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48416) Support related nested WITH expression
[ https://issues.apache.org/jira/browse/SPARK-48416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48416: -- Assignee: (was: Apache Spark) > Support related nested WITH expression > -- > > Key: SPARK-48416 > URL: https://issues.apache.org/jira/browse/SPARK-48416 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Mingliang Zhu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48416) Support related nested WITH expression
[ https://issues.apache.org/jira/browse/SPARK-48416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48416: -- Assignee: Apache Spark > Support related nested WITH expression > -- > > Key: SPARK-48416 > URL: https://issues.apache.org/jira/browse/SPARK-48416 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Mingliang Zhu >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48454) Directly use the parent dataframe class
[ https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48454: --- Labels: pull-request-available (was: ) > Directly use the parent dataframe class > --- > > Key: SPARK-48454 > URL: https://issues.apache.org/jira/browse/SPARK-48454 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48449) Using `getPropertyInfo` instead of `reflection` to obtain `JAAS application name`
[ https://issues.apache.org/jira/browse/SPARK-48449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48449: --- Labels: pull-request-available (was: ) > Using `getPropertyInfo` instead of `reflection` to obtain `JAAS application > name` > - > > Key: SPARK-48449 > URL: https://issues.apache.org/jira/browse/SPARK-48449 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48445) Don't inline UDFs with non-cheap children in CollapseProject
[ https://issues.apache.org/jira/browse/SPARK-48445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48445: --- Labels: pull-request-available (was: ) > Don't inline UDFs with non-cheap children in CollapseProject > > > Key: SPARK-48445 > URL: https://issues.apache.org/jira/browse/SPARK-48445 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Kelvin Jiang >Priority: Major > Labels: pull-request-available > > Because UDFs (and certain other expressions) are considered cheap by > CollapseProject.isCheap, they are inlined and potentially duplicated (which > is ok, because rules like ExtractPythonUDFs will de-duplicate them). However, > if the UDFs contain other non-cheap expressions, those will also be > duplicated and can potentially cause performance regressions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48442) Add parenthesis to awaitTermination call
[ https://issues.apache.org/jira/browse/SPARK-48442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48442: --- Labels: correctness pull-request-available starter (was: correctness starter) > Add parenthesis to awaitTermination call > > > Key: SPARK-48442 > URL: https://issues.apache.org/jira/browse/SPARK-48442 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.3 >Reporter: Riya Verma >Priority: Trivial > Labels: correctness, pull-request-available, starter > > In {{test_stream_reader}} and {{test_stream_writer}} of > {*}test_python_streaming_datasource.py{*}, the call {{q.awaitTermination}} > does not invoke a function call as intended, but instead returns a python > function object. The fix is to change this to {{{}q.awaitTermination(){}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48444) Refactor SQLQuerySuite
[ https://issues.apache.org/jira/browse/SPARK-48444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48444: --- Labels: pull-request-available (was: ) > Refactor SQLQuerySuite > -- > > Key: SPARK-48444 > URL: https://issues.apache.org/jira/browse/SPARK-48444 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47260: --- Labels: pull-request-available starter (was: starter) > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47361) Improve JDBC data sources
[ https://issues.apache.org/jira/browse/SPARK-47361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47361: --- Labels: pull-request-available releasenotes (was: releasenotes) > Improve JDBC data sources > - > > Key: SPARK-47361 > URL: https://issues.apache.org/jira/browse/SPARK-47361 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available, releasenotes > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48438) Directly use the parent column class
[ https://issues.apache.org/jira/browse/SPARK-48438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48438: --- Labels: pull-request-available (was: ) > Directly use the parent column class > > > Key: SPARK-48438 > URL: https://issues.apache.org/jira/browse/SPARK-48438 > Project: Spark > Issue Type: Improvement > Components: Connect, PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48437) Improve IsolatedClientLoader
[ https://issues.apache.org/jira/browse/SPARK-48437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48437: --- Labels: pull-request-available (was: ) > Improve IsolatedClientLoader > > > Key: SPARK-48437 > URL: https://issues.apache.org/jira/browse/SPARK-48437 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48436) Use `c.m.c.j.Driver` instead of `c.m.j.Driver` in `MySQLNamespaceSuite`
[ https://issues.apache.org/jira/browse/SPARK-48436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48436: --- Labels: pull-request-available (was: ) > Use `c.m.c.j.Driver` instead of `c.m.j.Driver` in `MySQLNamespaceSuite` > --- > > Key: SPARK-48436 > URL: https://issues.apache.org/jira/browse/SPARK-48436 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47258) Assign error classes to SHOW CREATE TABLE errors
[ https://issues.apache.org/jira/browse/SPARK-47258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47258: -- Assignee: (was: Apache Spark) > Assign error classes to SHOW CREATE TABLE errors > > > Key: SPARK-47258 > URL: https://issues.apache.org/jira/browse/SPARK-47258 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_127[0-5]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48435) UNICODE collation should not support binary equality
[ https://issues.apache.org/jira/browse/SPARK-48435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48435: -- Assignee: (was: Apache Spark) > UNICODE collation should not support binary equality > > > Key: SPARK-48435 > URL: https://issues.apache.org/jira/browse/SPARK-48435 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48435) UNICODE collation should not support binary equality
[ https://issues.apache.org/jira/browse/SPARK-48435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48435: -- Assignee: Apache Spark > UNICODE collation should not support binary equality > > > Key: SPARK-48435 > URL: https://issues.apache.org/jira/browse/SPARK-48435 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47258) Assign error classes to SHOW CREATE TABLE errors
[ https://issues.apache.org/jira/browse/SPARK-47258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47258: -- Assignee: Apache Spark > Assign error classes to SHOW CREATE TABLE errors > > > Key: SPARK-47258 > URL: https://issues.apache.org/jira/browse/SPARK-47258 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_127[0-5]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47258) Assign error classes to SHOW CREATE TABLE errors
[ https://issues.apache.org/jira/browse/SPARK-47258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47258: -- Assignee: (was: Apache Spark) > Assign error classes to SHOW CREATE TABLE errors > > > Key: SPARK-47258 > URL: https://issues.apache.org/jira/browse/SPARK-47258 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_127[0-5]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47258) Assign error classes to SHOW CREATE TABLE errors
[ https://issues.apache.org/jira/browse/SPARK-47258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47258: -- Assignee: Apache Spark > Assign error classes to SHOW CREATE TABLE errors > > > Key: SPARK-47258 > URL: https://issues.apache.org/jira/browse/SPARK-47258 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_127[0-5]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48435) UNICODE collation should not support binary equality
[ https://issues.apache.org/jira/browse/SPARK-48435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48435: --- Labels: pull-request-available (was: ) > UNICODE collation should not support binary equality > > > Key: SPARK-48435 > URL: https://issues.apache.org/jira/browse/SPARK-48435 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47258) Assign error classes to SHOW CREATE TABLE errors
[ https://issues.apache.org/jira/browse/SPARK-47258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47258: --- Labels: pull-request-available starter (was: starter) > Assign error classes to SHOW CREATE TABLE errors > > > Key: SPARK-47258 > URL: https://issues.apache.org/jira/browse/SPARK-47258 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_127[0-5]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48434) Make printSchema use the cached schema
[ https://issues.apache.org/jira/browse/SPARK-48434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48434: --- Labels: pull-request-available (was: ) > Make printSchema use the cached schema > -- > > Key: SPARK-48434 > URL: https://issues.apache.org/jira/browse/SPARK-48434 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48433) Upgrade `checkstyle` to 10.17.0
[ https://issues.apache.org/jira/browse/SPARK-48433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48433: --- Labels: pull-request-available (was: ) > Upgrade `checkstyle` to 10.17.0 > --- > > Key: SPARK-48433 > URL: https://issues.apache.org/jira/browse/SPARK-48433 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48431) Do not forward predicates on collated columns to file readers
[ https://issues.apache.org/jira/browse/SPARK-48431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48431: --- Labels: pull-request-available (was: ) > Do not forward predicates on collated columns to file readers > - > > Key: SPARK-48431 > URL: https://issues.apache.org/jira/browse/SPARK-48431 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Jan-Ole Sasse >Priority: Major > Labels: pull-request-available > > SPARK-47657 allows to push filters on collated columns to file sources that > support it. If such filters are pushed to file sources, those file sources > must not push those filters to the actual file readers (i.e. parquet or csv > readers), because there is no guarantee that those support collations. > With this task, we are widening filters on collations to be AlwaysTrue when > we translate filters for file sources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48432) Unnecessary Integer unboxing in UnivocityParser
[ https://issues.apache.org/jira/browse/SPARK-48432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48432: --- Labels: pull-request-available (was: ) > Unnecessary Integer unboxing in UnivocityParser > --- > > Key: SPARK-48432 > URL: https://issues.apache.org/jira/browse/SPARK-48432 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Priority: Major > Labels: pull-request-available > > `tokenIndexArr` is created as an array of `java.lang.Integers`. However, it > is used not only for the wrapped java parser, but also during parsing to > identify the correct token index. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48430) Fix map value extraction when map contains collated strings
[ https://issues.apache.org/jira/browse/SPARK-48430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48430: --- Labels: pull-request-available (was: ) > Fix map value extraction when map contains collated strings > --- > > Key: SPARK-48430 > URL: https://issues.apache.org/jira/browse/SPARK-48430 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nikola Mandic >Priority: Major > Labels: pull-request-available > > Following queries return unexpected results: > {code:java} > select collation(map('a', 'b' collate utf8_binary_lcase)['a']); > select collation(element_at(map('a', 'b' collate utf8_binary_lcase), > 'a'));{code} > Both return UTF8_BINARY instead of UTF8_BINARY_LCASE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39743) Unable to set zstd compression level while writing parquet files
[ https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-39743: --- Labels: pull-request-available (was: ) > Unable to set zstd compression level while writing parquet files > > > Key: SPARK-39743 > URL: https://issues.apache.org/jira/browse/SPARK-39743 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Yeachan Park >Assignee: ming95 >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > While writing zstd compressed parquet files, the following setting > `spark.io.compression.zstd.level` does not have any affect with regards to > the compression level of zstd. > All files seem to be written with the default zstd compression level, and the > config option seems to be ignored. > Using the zstd cli tool, we confirmed that setting a higher compression level > for the same file tested in spark resulted in a smaller file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48426) Make a documentation for SQL Operators with clear precedence defined
[ https://issues.apache.org/jira/browse/SPARK-48426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48426: --- Labels: pull-request-available (was: ) > Make a documentation for SQL Operators with clear precedence defined > > > Key: SPARK-48426 > URL: https://issues.apache.org/jira/browse/SPARK-48426 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48428) IllegalStateException due to nested column aliasing
[ https://issues.apache.org/jira/browse/SPARK-48428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48428: --- Labels: pull-request-available (was: ) > IllegalStateException due to nested column aliasing > --- > > Key: SPARK-48428 > URL: https://issues.apache.org/jira/browse/SPARK-48428 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: Emil Ejbyfeldt >Priority: Major > Labels: pull-request-available > > {code:java} > val f = udf[((Int, Int), Int), ((Int, Int), Int)](identity) > val ds1 = Seq(((1, 2), 1)).toDS > val rhs1 = ds1.select(ds1("_1._1")) > val tmp1 = ds1.join(rhs1, ds1("_1._1") === > rhs1("_1")).select(f(struct(ds1("_1"), > ds1("_2"))).as("tmp1")).select($"tmp1.*") > tmp1.select($"_1._2").collect() {code} > crashes with > {code:java} > java.lang.IllegalStateException: Couldn't find _1#6 in > [_extract__1#35,_2#7,_1#11] > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466) > at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:75) > at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:35) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:699) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466) > at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:75) > at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:35) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:699) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1215) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1214) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:533) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1215) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1214) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:533) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1215) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1214) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:533) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:405) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:73) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:94) > at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:75) > at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:35) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:94) > at > org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:69) > at > org.apache.spark.sql.execution.CodegenSupport.consume(WholeStageCodegenExec.scala:196) > at > org.apache.spark.sql.execution.CodegenSupport.consume$(WholeStageCodegenExec.scala:151) > at >
[jira] [Updated] (SPARK-48427) Upgrade scala-parser-combinators to 2.4
[ https://issues.apache.org/jira/browse/SPARK-48427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48427: --- Labels: pull-request-available (was: ) > Upgrade scala-parser-combinators to 2.4 > --- > > Key: SPARK-48427 > URL: https://issues.apache.org/jira/browse/SPARK-48427 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > https://github.com/scala/scala-parser-combinators/releases/tag/v2.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name
[ https://issues.apache.org/jira/browse/SPARK-48425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48425: --- Labels: pull-request-available (was: ) > Replaces pyspark-connect to pyspark_connect for its output name > --- > > Key: SPARK-48425 > URL: https://issues.apache.org/jira/browse/SPARK-48425 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > The issue is at setuptools from 69.X.X. > It replaces dash in package name to underscore > (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`) > https://github.com/pypa/setuptools/issues/4214 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48424) Make dev/is-changed.py to return true it it fails
[ https://issues.apache.org/jira/browse/SPARK-48424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48424: --- Labels: pull-request-available (was: ) > Make dev/is-changed.py to return true it it fails > - > > Key: SPARK-48424 > URL: https://issues.apache.org/jira/browse/SPARK-48424 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0, 3.5.2 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > e.g., > https://github.com/apache/spark/actions/runs/9244026522/job/25435224163?pr=46747 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48416) Support related nested WITH expression
[ https://issues.apache.org/jira/browse/SPARK-48416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48416: -- Assignee: Apache Spark > Support related nested WITH expression > -- > > Key: SPARK-48416 > URL: https://issues.apache.org/jira/browse/SPARK-48416 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Mingliang Zhu >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48416) Support related nested WITH expression
[ https://issues.apache.org/jira/browse/SPARK-48416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48416: -- Assignee: (was: Apache Spark) > Support related nested WITH expression > -- > > Key: SPARK-48416 > URL: https://issues.apache.org/jira/browse/SPARK-48416 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Mingliang Zhu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48416) Support related nested WITH expression
[ https://issues.apache.org/jira/browse/SPARK-48416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48416: -- Assignee: (was: Apache Spark) > Support related nested WITH expression > -- > > Key: SPARK-48416 > URL: https://issues.apache.org/jira/browse/SPARK-48416 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Mingliang Zhu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48416) Support related nested WITH expression
[ https://issues.apache.org/jira/browse/SPARK-48416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48416: -- Assignee: Apache Spark > Support related nested WITH expression > -- > > Key: SPARK-48416 > URL: https://issues.apache.org/jira/browse/SPARK-48416 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Mingliang Zhu >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48421) SPJ: Add documentation
[ https://issues.apache.org/jira/browse/SPARK-48421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48421: -- Assignee: (was: Apache Spark) > SPJ: Add documentation > -- > > Key: SPARK-48421 > URL: https://issues.apache.org/jira/browse/SPARK-48421 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 4.0.0 >Reporter: Szehon Ho >Priority: Major > Labels: pull-request-available > > As part of SPARK-48329, we mentioned "Storage Partition Join" but noticed > there is no documentation describing the same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48421) SPJ: Add documentation
[ https://issues.apache.org/jira/browse/SPARK-48421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48421: -- Assignee: Apache Spark > SPJ: Add documentation > -- > > Key: SPARK-48421 > URL: https://issues.apache.org/jira/browse/SPARK-48421 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 4.0.0 >Reporter: Szehon Ho >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > As part of SPARK-48329, we mentioned "Storage Partition Join" but noticed > there is no documentation describing the same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48421) SPJ: Add documentation
[ https://issues.apache.org/jira/browse/SPARK-48421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48421: --- Labels: pull-request-available (was: ) > SPJ: Add documentation > -- > > Key: SPARK-48421 > URL: https://issues.apache.org/jira/browse/SPARK-48421 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 4.0.0 >Reporter: Szehon Ho >Priority: Major > Labels: pull-request-available > > As part of SPARK-48329, we mentioned "Storage Partition Join" but noticed > there is no documentation describing the same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48420) Upgrade netty to `4.1.110.Final`
[ https://issues.apache.org/jira/browse/SPARK-48420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48420: --- Labels: pull-request-available (was: ) > Upgrade netty to `4.1.110.Final` > > > Key: SPARK-48420 > URL: https://issues.apache.org/jira/browse/SPARK-48420 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48419) Foldable propagation replace foldable column should use origin column
[ https://issues.apache.org/jira/browse/SPARK-48419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48419: --- Labels: pull-request-available (was: ) > Foldable propagation replace foldable column should use origin column > - > > Key: SPARK-48419 > URL: https://issues.apache.org/jira/browse/SPARK-48419 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.3, 3.1.3, 3.2.4, 4.0.0, 3.5.1, 3.3.4 >Reporter: KnightChess >Priority: Major > Labels: pull-request-available > > column name will be change by `FoldablePropagation` in optimizer > befor optimizer: > ```shell > 'Project ['x, 'y, 'z] > +- 'Project ['a AS x#112, str AS Y#113, 'b AS z#114] > +- LocalRelation , [a#0, b#1] > ``` > after optimizer: > ```shell > Project [x#112, str AS Y#113, z#114] > +- Project [a#0 AS x#112, str AS Y#113, b#1 AS z#114] > +- LocalRelation , [a#0, b#1] > ``` > column name `y` will be replace to 'Y' -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47041) PushDownUtils uses FileScanBuilder instead of SupportsPushDownCatalystFilters trait
[ https://issues.apache.org/jira/browse/SPARK-47041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47041: --- Labels: pull-request-available (was: ) > PushDownUtils uses FileScanBuilder instead of SupportsPushDownCatalystFilters > trait > --- > > Key: SPARK-47041 > URL: https://issues.apache.org/jira/browse/SPARK-47041 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Никита Соколов >Priority: Major > Labels: pull-request-available > > It could use an existing more generic interface looking like it was created > for that reason, but uses a narrower type forcing you to extend > FileScanBuilder when implementing a ScanBuilder. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48416) Support related nested WITH expression
[ https://issues.apache.org/jira/browse/SPARK-48416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48416: --- Labels: pull-request-available (was: ) > Support related nested WITH expression > -- > > Key: SPARK-48416 > URL: https://issues.apache.org/jira/browse/SPARK-48416 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Mingliang Zhu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48411) Add E2E test for DropDuplicateWithinWatermark
[ https://issues.apache.org/jira/browse/SPARK-48411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48411: --- Labels: pull-request-available (was: ) > Add E2E test for DropDuplicateWithinWatermark > - > > Key: SPARK-48411 > URL: https://issues.apache.org/jira/browse/SPARK-48411 > Project: Spark > Issue Type: New Feature > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > > Currently we do not have a e2e test for DropDuplicateWithinWatermark, we > should add one. We can simply use one of the test written in Scala here (with > the testStream API) and replicate it to python: > [https://github.com/apache/spark/commit/0e9e34c1bd9bd16ad5efca77ce2763eb950f3103] > > The change should happen in > [https://github.com/apache/spark/blob/eee179135ed21dbdd8b342d053c9eda849e2de77/python/pyspark/sql/tests/streaming/test_streaming.py#L29] > > so we can test it in both connect and non-connect. > > Test with: > ``` > python/run-tests --testnames pyspark.sql.tests.streaming.test_streaming > python/run-tests --testnames > pyspark.sql.tests.connect.streaming.test_parity_streaming > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48415) TypeName support parameterized datatypes
[ https://issues.apache.org/jira/browse/SPARK-48415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48415: --- Labels: pull-request-available (was: ) > TypeName support parameterized datatypes > > > Key: SPARK-48415 > URL: https://issues.apache.org/jira/browse/SPARK-48415 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48414) Fix breaking change in python's `fromJson`
[ https://issues.apache.org/jira/browse/SPARK-48414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48414: --- Labels: pull-request-available (was: ) > Fix breaking change in python's `fromJson` > -- > > Key: SPARK-48414 > URL: https://issues.apache.org/jira/browse/SPARK-48414 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48409) Upgrade MySQL & Postgres & mariadb docker image version
[ https://issues.apache.org/jira/browse/SPARK-48409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48409: --- Labels: pull-request-available (was: ) > Upgrade MySQL & Postgres & mariadb docker image version > --- > > Key: SPARK-48409 > URL: https://issues.apache.org/jira/browse/SPARK-48409 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48413) ALTER COLUMN with collation
[ https://issues.apache.org/jira/browse/SPARK-48413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48413: --- Labels: pull-request-available (was: ) > ALTER COLUMN with collation > --- > > Key: SPARK-48413 > URL: https://issues.apache.org/jira/browse/SPARK-48413 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nikola Mandic >Priority: Major > Labels: pull-request-available > > Add support for changing collation of a column with ALTER COLUMN command. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48412) Refactor data type json parse
[ https://issues.apache.org/jira/browse/SPARK-48412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48412: --- Labels: pull-request-available (was: ) > Refactor data type json parse > - > > Key: SPARK-48412 > URL: https://issues.apache.org/jira/browse/SPARK-48412 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48410) Fix InitCap expression
[ https://issues.apache.org/jira/browse/SPARK-48410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48410: --- Labels: pull-request-available (was: ) > Fix InitCap expression > -- > > Key: SPARK-48410 > URL: https://issues.apache.org/jira/browse/SPARK-48410 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47257) Assign error classes to ALTER COLUMN errors
[ https://issues.apache.org/jira/browse/SPARK-47257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47257: --- Labels: pull-request-available starter (was: starter) > Assign error classes to ALTER COLUMN errors > --- > > Key: SPARK-47257 > URL: https://issues.apache.org/jira/browse/SPARK-47257 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_105[3-4]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48408) Simplify `date_format` & `from_unixtime`
[ https://issues.apache.org/jira/browse/SPARK-48408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48408: --- Labels: pull-request-available (was: ) > Simplify `date_format` & `from_unixtime` > > > Key: SPARK-48408 > URL: https://issues.apache.org/jira/browse/SPARK-48408 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org