[jira] [Updated] (SPARK-48484) V2Write use the same TaskAttemptId for different task attempts
[ https://issues.apache.org/jira/browse/SPARK-48484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48484: --- Labels: pull-request-available (was: ) > V2Write use the same TaskAttemptId for different task attempts > -- > > Key: SPARK-48484 > URL: https://issues.apache.org/jira/browse/SPARK-48484 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48484) V2Write use the same TaskAttemptId for different task attempts
Yang Jie created SPARK-48484: Summary: V2Write use the same TaskAttemptId for different task attempts Key: SPARK-48484 URL: https://issues.apache.org/jira/browse/SPARK-48484 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.3, 3.5.1, 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48483) Allow UnsafeExternalSorter to spill when other consumer request memory
Jin Chengcheng created SPARK-48483: -- Summary: Allow UnsafeExternalSorter to spill when other consumer request memory Key: SPARK-48483 URL: https://issues.apache.org/jira/browse/SPARK-48483 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Environment: Ubuntu Reporter: Jin Chengcheng Fix For: 4.0.0 The downstream Gluten(Native spark engine) meets an OOM exception. {code:java} 24/04/27 11:42:59 ERROR [Executor task launch worker for task 403.0 in stage 4.0 (TID 91404)] nmm.ManagedReservationListener: Error reserving memory from target org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: Not enough spark off-heap execution memory. Acquired: 40.0 MiB, granted: 8.0 MiB. Try tweaking config option spark.memory.offHeap.size to get larger space to run this application. Current config settings: spark.gluten.memory.offHeap.size.in.bytes=50.0 GiB spark.gluten.memory.task.offHeap.size.in.bytes=12.5 GiB spark.gluten.memory.conservative.task.offHeap.size.in.bytes=6.3 GiB Memory consumer stats: Task.91404: Current used bytes: 12.5 GiB, peak bytes:N/A +- org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@a7836d4: Current used bytes: 12.4 GiB, peak bytes:N/A \- Gluten.Tree.194: Current used bytes: 56.0 MiB, peak bytes: 11.7 GiB \- root.194: Current used bytes: 56.0 MiB, peak bytes: 11.7 GiB +- WholeStageIterator.194: Current used bytes: 32.0 MiB, peak bytes:9.0 GiB | \- single: Current used bytes: 23.0 MiB, peak bytes:9.0 GiB | +- task.Gluten_Stage_4_TID_91404: Current used bytes: 23.0 MiB, peak bytes:9.0 GiB | | +- node.3: Current used bytes: 21.0 MiB, peak bytes:9.0 GiB | | | +- op.3.1.0.HashBuild: Current used bytes: 10.8 MiB, peak bytes:8.5 GiB | | | \- op.3.0.0.HashProbe: Current used bytes:9.2 MiB, peak bytes: 21.6 MiB | | +- node.5: Current used bytes: 1024.0 KiB, peak bytes:2.0 MiB | | | \- op.5.0.0.FilterProject: Current used bytes: 129.4 KiB, peak bytes: 1232.0 KiB | | +- node.2: Current used bytes: 1024.0 KiB, peak bytes: 1024.0 KiB | | | \- op.2.1.0.FilterProject: Current used bytes: 128.4 KiB, peak bytes: 192.4 KiB | | +- node.1: Current used bytes: 0.0 B, peak bytes: 0.0 B | | | \- op.1.1.0.ValueStream: Current used bytes: 0.0 B, peak bytes: 0.0 B | | +- node.0: Current used bytes: 0.0 B, peak bytes: 0.0 B | | | \- op.0.0.0.ValueStream: Current used bytes: 0.0 B, peak bytes: 0.0 B | | \- node.4: Current used bytes: 0.0 B, peak bytes: 0.0 B | | \- op.4.0.0.FilterProject: Current used bytes: 0.0 B, peak bytes: 0.0 B | \- WholeStageIterator_default_leaf: Current used bytes: 0.0 B, peak bytes: 0.0 B +- ArrowContextInstance.0: Current used bytes:8.0 MiB, peak bytes:8.0 MiB +- ColumnarToRow.2: Current used bytes:8.0 MiB, peak bytes: 16.0 MiB | \- single: Current used bytes:6.0 MiB, peak bytes:9.0 MiB | \- ColumnarToRow_default_leaf: Current used bytes:6.0 MiB, peak bytes:9.0 MiB +- ShuffleReader.3: Current used bytes:8.0 MiB, peak bytes: 16.0 MiB | \- single:
[jira] [Updated] (SPARK-48483) Allow UnsafeExternalSorter to spill when other consumer requests memory
[ https://issues.apache.org/jira/browse/SPARK-48483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jin Chengcheng updated SPARK-48483: --- Summary: Allow UnsafeExternalSorter to spill when other consumer requests memory (was: Allow UnsafeExternalSorter to spill when other consumer request memory) > Allow UnsafeExternalSorter to spill when other consumer requests memory > --- > > Key: SPARK-48483 > URL: https://issues.apache.org/jira/browse/SPARK-48483 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 > Environment: Ubuntu >Reporter: Jin Chengcheng >Priority: Major > Fix For: 4.0.0 > > > The downstream Gluten(Native spark engine) meets an OOM exception. > > {code:java} > 24/04/27 11:42:59 ERROR [Executor task launch worker for task 403.0 in stage > 4.0 (TID 91404)] nmm.ManagedReservationListener: Error reserving memory from > target > org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: > Not enough spark off-heap execution memory. Acquired: 40.0 MiB, granted: 8.0 > MiB. Try tweaking config option spark.memory.offHeap.size to get larger space > to run this application. > Current config settings: > spark.gluten.memory.offHeap.size.in.bytes=50.0 GiB > spark.gluten.memory.task.offHeap.size.in.bytes=12.5 GiB > spark.gluten.memory.conservative.task.offHeap.size.in.bytes=6.3 GiB > Memory consumer stats: > Task.91404: > Current used bytes: 12.5 GiB, peak bytes:N/A > +- > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@a7836d4: > Current used bytes: 12.4 GiB, peak bytes:N/A > \- Gluten.Tree.194: > Current used bytes: 56.0 MiB, peak bytes: 11.7 GiB > \- root.194: > Current used bytes: 56.0 MiB, peak bytes: 11.7 GiB > +- WholeStageIterator.194: > Current used bytes: 32.0 MiB, peak bytes:9.0 GiB > | \- single: > Current used bytes: 23.0 MiB, peak bytes:9.0 GiB > | +- task.Gluten_Stage_4_TID_91404: > Current used bytes: 23.0 MiB, peak bytes:9.0 GiB > | | +- node.3: > Current used bytes: 21.0 MiB, peak bytes:9.0 GiB > | | | +- op.3.1.0.HashBuild: > Current used bytes: 10.8 MiB, peak bytes:8.5 GiB > | | | \- op.3.0.0.HashProbe: > Current used bytes:9.2 MiB, peak bytes: 21.6 MiB > | | +- node.5: > Current used bytes: 1024.0 KiB, peak bytes:2.0 MiB > | | | \- op.5.0.0.FilterProject: > Current used bytes: 129.4 KiB, peak bytes: 1232.0 KiB > | | +- node.2: > Current used bytes: 1024.0 KiB, peak bytes: 1024.0 KiB > | | | \- op.2.1.0.FilterProject: > Current used bytes: 128.4 KiB, peak bytes: 192.4 KiB > | | +- node.1: > Current used bytes: 0.0 B, peak bytes: 0.0 B > | | | \- op.1.1.0.ValueStream: > Current used bytes: 0.0 B, peak bytes: 0.0 B > | | +- node.0: > Current used bytes: 0.0 B, peak bytes: 0.0 B > | | | \- op.0.0.0.ValueStream: > Current used bytes: 0.0 B, peak bytes: 0.0 B > | | \- node.4: > Current used bytes: 0.0 B, peak bytes: 0.0 B > | | \- op.4.0.0.FilterProject: > Current used bytes: 0.0 B, peak bytes: 0.0 B > | \- WholeStageIterator_default_leaf: > Current used bytes: 0.0 B, peak bytes: 0.0 B > +- ArrowContextInstance.0: > Current used bytes:8.0 MiB, peak bytes:8.0 MiB > +- ColumnarToRow.2: > Current used bytes:8.0 MiB, peak bytes: 16.0 MiB > |
[jira] [Updated] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
[ https://issues.apache.org/jira/browse/SPARK-48482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48482: --- Labels: pull-request-available (was: ) > dropDuplicates and dropDuplicatesWithinWatermark should accept varargs > -- > > Key: SPARK-48482 > URL: https://issues.apache.org/jira/browse/SPARK-48482 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
Wei Liu created SPARK-48482: --- Summary: dropDuplicates and dropDuplicatesWithinWatermark should accept varargs Key: SPARK-48482 URL: https://issues.apache.org/jira/browse/SPARK-48482 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 4.0.0 Reporter: Wei Liu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48476) NPE thrown when delimiter set to null in CSV
[ https://issues.apache.org/jira/browse/SPARK-48476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48476: --- Labels: pull-request-available (was: ) > NPE thrown when delimiter set to null in CSV > > > Key: SPARK-48476 > URL: https://issues.apache.org/jira/browse/SPARK-48476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Milan Stefanovic >Priority: Major > Labels: pull-request-available > > When customers specified delimiter to null, currently we throw NPE. We should > throw customer facing error > repro: > spark.read.format("csv") > .option("delimiter", null) > .load() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
[ https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48474: Assignee: BingKun Pan > Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit` > --- > > Key: SPARK-48474 > URL: https://issues.apache.org/jira/browse/SPARK-48474 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
[ https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48474. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46808 [https://github.com/apache/spark/pull/46808] > Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit` > --- > > Key: SPARK-48474 > URL: https://issues.apache.org/jira/browse/SPARK-48474 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48467) Upgrade Maven to 3.9.7
[ https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48467: Assignee: BingKun Pan > Upgrade Maven to 3.9.7 > -- > > Key: SPARK-48467 > URL: https://issues.apache.org/jira/browse/SPARK-48467 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48467) Upgrade Maven to 3.9.7
[ https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48467. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46798 [https://github.com/apache/spark/pull/46798] > Upgrade Maven to 3.9.7 > -- > > Key: SPARK-48467 > URL: https://issues.apache.org/jira/browse/SPARK-48467 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict
[ https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47716: Assignee: Jack Chen > SQLQueryTestSuite flaky case due to view name conflict > -- > > Key: SPARK-47716 > URL: https://issues.apache.org/jira/browse/SPARK-47716 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jack Chen >Assignee: Jack Chen >Priority: Major > Labels: pull-request-available > > In SQLQueryTestSuite, the test case "Test logic for determining whether a > query is semantically sorted" can sometimes fail with an error > {{Cannot create table or view `main`.`default`.`t1` because it already > exists.}} > if run concurrently with other sql test cases that also create tables with > the same name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict
[ https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47716. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45855 [https://github.com/apache/spark/pull/45855] > SQLQueryTestSuite flaky case due to view name conflict > -- > > Key: SPARK-47716 > URL: https://issues.apache.org/jira/browse/SPARK-47716 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jack Chen >Assignee: Jack Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > In SQLQueryTestSuite, the test case "Test logic for determining whether a > query is semantically sorted" can sometimes fail with an error > {{Cannot create table or view `main`.`default`.`t1` because it already > exists.}} > if run concurrently with other sql test cases that also create tables with > the same name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48419) Foldable propagation replace foldable column should use origin column
[ https://issues.apache.org/jira/browse/SPARK-48419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48419: --- Assignee: KnightChess > Foldable propagation replace foldable column should use origin column > - > > Key: SPARK-48419 > URL: https://issues.apache.org/jira/browse/SPARK-48419 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.3, 3.1.3, 3.2.4, 4.0.0, 3.5.1, 3.3.4 >Reporter: KnightChess >Assignee: KnightChess >Priority: Major > Labels: pull-request-available > > column name will be change by `FoldablePropagation` in optimizer > befor optimizer: > ```shell > 'Project ['x, 'y, 'z] > +- 'Project ['a AS x#112, str AS Y#113, 'b AS z#114] > +- LocalRelation , [a#0, b#1] > ``` > after optimizer: > ```shell > Project [x#112, str AS Y#113, z#114] > +- Project [a#0 AS x#112, str AS Y#113, b#1 AS z#114] > +- LocalRelation , [a#0, b#1] > ``` > column name `y` will be replace to 'Y' -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48419) Foldable propagation replace foldable column should use origin column
[ https://issues.apache.org/jira/browse/SPARK-48419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48419. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46742 [https://github.com/apache/spark/pull/46742] > Foldable propagation replace foldable column should use origin column > - > > Key: SPARK-48419 > URL: https://issues.apache.org/jira/browse/SPARK-48419 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.3, 3.1.3, 3.2.4, 4.0.0, 3.5.1, 3.3.4 >Reporter: KnightChess >Assignee: KnightChess >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > column name will be change by `FoldablePropagation` in optimizer > befor optimizer: > ```shell > 'Project ['x, 'y, 'z] > +- 'Project ['a AS x#112, str AS Y#113, 'b AS z#114] > +- LocalRelation , [a#0, b#1] > ``` > after optimizer: > ```shell > Project [x#112, str AS Y#113, z#114] > +- Project [a#0 AS x#112, str AS Y#113, b#1 AS z#114] > +- LocalRelation , [a#0, b#1] > ``` > column name `y` will be replace to 'Y' -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48461) Replace NullPointerExceptions with proper error classes in AssertNotNull expression
[ https://issues.apache.org/jira/browse/SPARK-48461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48461. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46793 [https://github.com/apache/spark/pull/46793] > Replace NullPointerExceptions with proper error classes in AssertNotNull > expression > --- > > Key: SPARK-48461 > URL: https://issues.apache.org/jira/browse/SPARK-48461 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > [Code location > here|https://github.com/apache/spark/blob/f5d9b809881552c0e1b5af72b2a32caa25018eb3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L1929] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48480) StreamingQueryListener thread should not be interruptable
Wei Liu created SPARK-48480: --- Summary: StreamingQueryListener thread should not be interruptable Key: SPARK-48480 URL: https://issues.apache.org/jira/browse/SPARK-48480 Project: Spark Issue Type: New Feature Components: Connect, SS Affects Versions: 4.0.0 Reporter: Wei Liu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
[ https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48446: Assignee: Yuchen Liu > Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax > -- > > Key: SPARK-48446 > URL: https://issues.apache.org/jira/browse/SPARK-48446 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Yuchen Liu >Assignee: Yuchen Liu >Priority: Minor > Labels: easyfix, pull-request-available > Original Estimate: 1h > Remaining Estimate: 1h > > For dropDuplicates, the example on > [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22] > is out of date compared with > [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html]. > The argument should be a list. > The discrepancy is also true for dropDuplicatesWithinWatermark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48479) Support creating SQL functions in parser
[ https://issues.apache.org/jira/browse/SPARK-48479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48479: --- Labels: pull-request-available (was: ) > Support creating SQL functions in parser > > > Key: SPARK-48479 > URL: https://issues.apache.org/jira/browse/SPARK-48479 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Add Spark SQL parser for creating SQL functions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
[ https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48446. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46797 [https://github.com/apache/spark/pull/46797] > Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax > -- > > Key: SPARK-48446 > URL: https://issues.apache.org/jira/browse/SPARK-48446 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Yuchen Liu >Assignee: Yuchen Liu >Priority: Minor > Labels: easyfix, pull-request-available > Fix For: 4.0.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > For dropDuplicates, the example on > [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22] > is out of date compared with > [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html]. > The argument should be a list. > The discrepancy is also true for dropDuplicatesWithinWatermark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48475) Optimize _get_jvm_function in PySpark.
[ https://issues.apache.org/jira/browse/SPARK-48475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48475. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46809 [https://github.com/apache/spark/pull/46809] > Optimize _get_jvm_function in PySpark. > -- > > Key: SPARK-48475 > URL: https://issues.apache.org/jira/browse/SPARK-48475 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48479) Support creating SQL functions in parser
[ https://issues.apache.org/jira/browse/SPARK-48479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-48479: - Summary: Support creating SQL functions in parser (was: Support ccreating SQL functions in parser) > Support creating SQL functions in parser > > > Key: SPARK-48479 > URL: https://issues.apache.org/jira/browse/SPARK-48479 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > > Add Spark SQL parser for creating SQL functions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48479) Support ccreating SQL functions in parser
[ https://issues.apache.org/jira/browse/SPARK-48479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-48479: - Summary: Support ccreating SQL functions in parser (was: Add support for creating SQL functions in parser) > Support ccreating SQL functions in parser > - > > Key: SPARK-48479 > URL: https://issues.apache.org/jira/browse/SPARK-48479 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > > Add Spark SQL parser for creating SQL functions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48465) Avoid no-op empty relation propagation in AQE
[ https://issues.apache.org/jira/browse/SPARK-48465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48465: --- Labels: pull-request-available (was: ) > Avoid no-op empty relation propagation in AQE > - > > Key: SPARK-48465 > URL: https://issues.apache.org/jira/browse/SPARK-48465 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ziqi Liu >Priority: Major > Labels: pull-request-available > > We should avoid no-op empty relation propagation in AQE: if we convert an > empty QueryStageExec to empty relation, it will further wrapped into a new > query stage and execute -> produce empty result -> empty relation propagation > again. This issue is currently not exposed because AQE will try to reuse > shuffle. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48479) Add support for creating SQL functions in parser
Allison Wang created SPARK-48479: Summary: Add support for creating SQL functions in parser Key: SPARK-48479 URL: https://issues.apache.org/jira/browse/SPARK-48479 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Allison Wang Add Spark SQL parser for creating SQL functions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48468) Add LogicalQueryStage interface in catalyst
[ https://issues.apache.org/jira/browse/SPARK-48468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48468. - Fix Version/s: 4.0.0 Resolution: Fixed > Add LogicalQueryStage interface in catalyst > --- > > Key: SPARK-48468 > URL: https://issues.apache.org/jira/browse/SPARK-48468 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ziqi Liu >Assignee: Ziqi Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add `LogicalQueryStage` interface in catalyst so that it's visible in logical > rules -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48478) Allow passing iterator of PyArrow RecordBatches to createDataFrame()
[ https://issues.apache.org/jira/browse/SPARK-48478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850865#comment-17850865 ] Ian Cook commented on SPARK-48478: -- For Connect, see class {{LocalRelation}} in {{{}python/pyspark/sql/connect/plan.py{}}}. Something similar could be used to create a local relation from an iterator of RecordBatches. (But do we need to create this as a cached remote relation? Creating it locally will just fill up client memory I think) For Classic, see {{{}_create_from_arrow_table{}}}. Something similar could be used to create a DataFrame from an iterator of RecordBatches. > Allow passing iterator of PyArrow RecordBatches to createDataFrame() > > > Key: SPARK-48478 > URL: https://issues.apache.org/jira/browse/SPARK-48478 > Project: Spark > Issue Type: Improvement > Components: Connect, Input/Output, PySpark, SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ian Cook >Priority: Major > > As a follow-up to SPARK-48220: > For larger data, it would be nice to be able to pass an iterator of PyArrow > RecordBatches to {{{}createDataFrame(){}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48468) Add LogicalQueryStage interface in catalyst
[ https://issues.apache.org/jira/browse/SPARK-48468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48468: --- Assignee: Ziqi Liu > Add LogicalQueryStage interface in catalyst > --- > > Key: SPARK-48468 > URL: https://issues.apache.org/jira/browse/SPARK-48468 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ziqi Liu >Assignee: Ziqi Liu >Priority: Major > Labels: pull-request-available > > Add `LogicalQueryStage` interface in catalyst so that it's visible in logical > rules -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48477) Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite
[ https://issues.apache.org/jira/browse/SPARK-48477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48477. - Fix Version/s: 4.0.0 Resolution: Fixed > Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite > -- > > Key: SPARK-48477 > URL: https://issues.apache.org/jira/browse/SPARK-48477 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48478) Allow passing iterator of PyArrow RecordBatches to createDataFrame()
Ian Cook created SPARK-48478: Summary: Allow passing iterator of PyArrow RecordBatches to createDataFrame() Key: SPARK-48478 URL: https://issues.apache.org/jira/browse/SPARK-48478 Project: Spark Issue Type: Improvement Components: Connect, Input/Output, PySpark, SQL Affects Versions: 3.5.1, 4.0.0 Reporter: Ian Cook As a follow-up to SPARK-48220: For larger data, it would be nice to be able to pass an iterator of PyArrow RecordBatches to {{{}createDataFrame(){}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48478) Allow passing iterator of PyArrow RecordBatches to createDataFrame()
[ https://issues.apache.org/jira/browse/SPARK-48478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-48478: - Language: Python > Allow passing iterator of PyArrow RecordBatches to createDataFrame() > > > Key: SPARK-48478 > URL: https://issues.apache.org/jira/browse/SPARK-48478 > Project: Spark > Issue Type: Improvement > Components: Connect, Input/Output, PySpark, SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ian Cook >Priority: Major > > As a follow-up to SPARK-48220: > For larger data, it would be nice to be able to pass an iterator of PyArrow > RecordBatches to {{{}createDataFrame(){}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47466) Add PySpark DataFrame method to return iterator of PyArrow RecordBatches
[ https://issues.apache.org/jira/browse/SPARK-47466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850855#comment-17850855 ] Ian Cook commented on SPARK-47466: -- For Connect, see the function {{to_table_as_iterator}} in {{python/pyspark/sql/connect/client/core.py}}. To return an iterator of RecordBatches we could add another function similar to that. For Classic, see the function {{_collect_as_arrow}} in {{python/pyspark/sql/pandas/conversion.py}}. To return an iterator of RecordBatches we could add another function similar to that. > Add PySpark DataFrame method to return iterator of PyArrow RecordBatches > > > Key: SPARK-47466 > URL: https://issues.apache.org/jira/browse/SPARK-47466 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.5.1 >Reporter: Ian Cook >Priority: Major > > As a follow-up to SPARK-47365: > {{toArrow()}} is useful when the data is relatively small. For larger data, > the best way to return the contents of a PySpark DataFrame in Arrow format is > to return an iterator of [PyArrow > RecordBatches|https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48477) Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite
[ https://issues.apache.org/jira/browse/SPARK-48477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated SPARK-48477: - Summary: Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite (was: Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite, HivePlanTest) > Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite > -- > > Key: SPARK-48477 > URL: https://issues.apache.org/jira/browse/SPARK-48477 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48477) Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite, HivePlanTest
[ https://issues.apache.org/jira/browse/SPARK-48477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48477: --- Labels: pull-request-available (was: ) > Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite, > HivePlanTest > > > Key: SPARK-48477 > URL: https://issues.apache.org/jira/browse/SPARK-48477 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48477) Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite, HivePlanTest
Rui Wang created SPARK-48477: Summary: Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite, HivePlanTest Key: SPARK-48477 URL: https://issues.apache.org/jira/browse/SPARK-48477 Project: Spark Issue Type: Sub-task Components: Tests Affects Versions: 4.0.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48000) Hash join support for strings with collation (StringType only)
[ https://issues.apache.org/jira/browse/SPARK-48000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850809#comment-17850809 ] Uroš Bojanić commented on SPARK-48000: -- [~hudson] that one is no longer relevant - closed it, thanks > Hash join support for strings with collation (StringType only) > -- > > Key: SPARK-48000 > URL: https://issues.apache.org/jira/browse/SPARK-48000 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48008) Support UDAF in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell reassigned SPARK-48008: - Assignee: Pengfei Xu > Support UDAF in Spark Connect > - > > Key: SPARK-48008 > URL: https://issues.apache.org/jira/browse/SPARK-48008 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Pengfei Xu >Assignee: Pengfei Xu >Priority: Major > Labels: pull-request-available > > Currently Spark Connect supports only UDFs. We need to add support for UDAFs, > specifically `Aggregator[INT, BUF, OUT]`. > The user-facing API should not change, which includes Aggregator methods and > the `spark.udf.register("agg", udaf(agg))` API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48008) Support UDAF in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-48008. --- Fix Version/s: 4.0.0 Resolution: Fixed > Support UDAF in Spark Connect > - > > Key: SPARK-48008 > URL: https://issues.apache.org/jira/browse/SPARK-48008 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Pengfei Xu >Assignee: Pengfei Xu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently Spark Connect supports only UDFs. We need to add support for UDAFs, > specifically `Aggregator[INT, BUF, OUT]`. > The user-facing API should not change, which includes Aggregator methods and > the `spark.udf.register("agg", udaf(agg))` API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48292) Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
[ https://issues.apache.org/jira/browse/SPARK-48292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48292. - Fix Version/s: 4.0.0 Assignee: angerszhu (was: L. C. Hsieh) Resolution: Fixed > Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage > when committed file not consistent with task status > -- > > Key: SPARK-48292 > URL: https://issues.apache.org/jira/browse/SPARK-48292 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: L. C. Hsieh >Assignee: angerszhu >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > When a task attemp fails but it is authorized to do task commit, > OutputCommitCoordinator will make the stage failed with a reason message > which says that task commit success, but actually the driver never knows if a > task commit is successful or not. We should update the reason message to make > it less confused. > See https://github.com/apache/spark/pull/36564#discussion_r1598660630 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48000) Hash join support for strings with collation (StringType only)
[ https://issues.apache.org/jira/browse/SPARK-48000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850805#comment-17850805 ] Hudson commented on SPARK-48000: User 'uros-db' has created a pull request for this issue: https://github.com/apache/spark/pull/46166 > Hash join support for strings with collation (StringType only) > -- > > Key: SPARK-48000 > URL: https://issues.apache.org/jira/browse/SPARK-48000 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48292) Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
[ https://issues.apache.org/jira/browse/SPARK-48292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48292: --- Assignee: L. C. Hsieh > Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage > when committed file not consistent with task status > -- > > Key: SPARK-48292 > URL: https://issues.apache.org/jira/browse/SPARK-48292 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > Labels: pull-request-available > > When a task attemp fails but it is authorized to do task commit, > OutputCommitCoordinator will make the stage failed with a reason message > which says that task commit success, but actually the driver never knows if a > task commit is successful or not. We should update the reason message to make > it less confused. > See https://github.com/apache/spark/pull/36564#discussion_r1598660630 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48476) NPE thrown when delimiter set to null in CSV
Milan Stefanovic created SPARK-48476: Summary: NPE thrown when delimiter set to null in CSV Key: SPARK-48476 URL: https://issues.apache.org/jira/browse/SPARK-48476 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 4.0.0 Reporter: Milan Stefanovic When customers specified delimiter to null, currently we throw NPE. We should throw customer facing error repro: spark.read.format("csv") .option("delimiter", null) .load() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
[ https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48474: --- Labels: pull-request-available (was: ) > Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit` > --- > > Key: SPARK-48474 > URL: https://issues.apache.org/jira/browse/SPARK-48474 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
BingKun Pan created SPARK-48474: --- Summary: Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit` Key: SPARK-48474 URL: https://issues.apache.org/jira/browse/SPARK-48474 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48473) Add extensible trait to allow-list non-deterministic expressions in operators in CheckAnalysis
[ https://issues.apache.org/jira/browse/SPARK-48473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carmen Kwan updated SPARK-48473: Component/s: SQL (was: Spark Core) > Add extensible trait to allow-list non-deterministic expressions in operators > in CheckAnalysis > -- > > Key: SPARK-48473 > URL: https://issues.apache.org/jira/browse/SPARK-48473 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Carmen Kwan >Priority: Major > > CheckAnalysis throws an `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception > when there is a non-deterministic expression within an operator that is not > allow listed in the case match check > [below|https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L773-L784]: > > {code:java} > case o if o.expressions.exists(!_.deterministic) && > !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] && > !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] && > !o.isInstanceOf[Expand] && > !o.isInstanceOf[Generate] && > // Lateral join is checked in checkSubqueryExpression. > !o.isInstanceOf[LateralJoin] => > // The rule above is used to check Aggregate operator. > o.failAnalysis( > errorClass = "INVALID_NON_DETERMINISTIC_EXPRESSIONS", > messageParameters = Map("sqlExprs" -> > o.expressions.map(toSQLExpr(_)).mkString(", ")) > ){code} > > It would be nice to add a generic trait/class to this case match that is > allow listed so that when new non-deterministic expressions that live in > other repositories needs to be allow listed, we don't need to wait for a new > spark release. For example, in Delta Lake, we want to allow list a specific > non-deterministic expression for the DeltaMergeIntoMatchedUpdateClause > operator as part of Delta's [Identity Column > implementation.|https://github.com/delta-io/delta/issues/1959]It is cleaner > overall to add an abstract generic class there than to put Delta specific > logic into this CheckAnalysis rule. > It would be beneficial to backport this to Spark 3.5 so that we don't need to > wait for the Spark 4 to benefit from this low risk change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42252) Deprecate spark.shuffle.unsafe.file.output.buffer and add a new config
[ https://issues.apache.org/jira/browse/SPARK-42252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42252: --- Labels: pull-request-available (was: ) > Deprecate spark.shuffle.unsafe.file.output.buffer and add a new config > -- > > Key: SPARK-42252 > URL: https://issues.apache.org/jira/browse/SPARK-42252 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0 >Reporter: Wei Guo >Priority: Minor > Labels: pull-request-available > > After Jira SPARK-28209 and PR > [25007|[https://github.com/apache/spark/pull/25007]], the new shuffle writer > api is proposed. All shuffle writers(BypassMergeSortShuffleWriter, > SortShuffleWriter, UnsafeShuffleWriter) are based on > LocalDiskShuffleMapOutputWriter to write local disk shuffle files. The config > spark.shuffle.unsafe.file.output.buffer used in > LocalDiskShuffleMapOutputWriter was only used in UnsafeShuffleWriter before. > > It's better to rename it and make it more suitable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48473) Add extensible trait to allow-list non-deterministic expressions in operators in CheckAnalysis
Carmen Kwan created SPARK-48473: --- Summary: Add extensible trait to allow-list non-deterministic expressions in operators in CheckAnalysis Key: SPARK-48473 URL: https://issues.apache.org/jira/browse/SPARK-48473 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0, 3.5.2 Reporter: Carmen Kwan CheckAnalysis throws an `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception when there is a non-deterministic expression within an operator that is not allow listed in the case match check [below|https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L773-L784]: {code:java} case o if o.expressions.exists(!_.deterministic) && !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] && !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] && !o.isInstanceOf[Expand] && !o.isInstanceOf[Generate] && // Lateral join is checked in checkSubqueryExpression. !o.isInstanceOf[LateralJoin] => // The rule above is used to check Aggregate operator. o.failAnalysis( errorClass = "INVALID_NON_DETERMINISTIC_EXPRESSIONS", messageParameters = Map("sqlExprs" -> o.expressions.map(toSQLExpr(_)).mkString(", ")) ){code} It would be nice to add a generic trait/class to this case match that is allow listed so that when new non-deterministic expressions that live in other repositories needs to be allow listed, we don't need to wait for a new spark release. For example, in Delta Lake, we want to allow list a specific non-deterministic expression for the DeltaMergeIntoMatchedUpdateClause operator as part of Delta's [Identity Column implementation.|https://github.com/delta-io/delta/issues/1959]It is cleaner overall to add an abstract generic class there than to put Delta specific logic into this CheckAnalysis rule. It would be beneficial to backport this to Spark 3.5 so that we don't need to wait for the Spark 4 to benefit from this low risk change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48439) Derby: Retain as many significant digits as possible when decimal precision greater than 31
[ https://issues.apache.org/jira/browse/SPARK-48439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-48439: Assignee: Kent Yao > Derby: Retain as many significant digits as possible when decimal precision > greater than 31 > > > Key: SPARK-48439 > URL: https://issues.apache.org/jira/browse/SPARK-48439 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48439) Derby: Retain as many significant digits as possible when decimal precision greater than 31
[ https://issues.apache.org/jira/browse/SPARK-48439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-48439. -- Resolution: Fixed Resolved by https://github.com/apache/spark/pull/46776 > Derby: Retain as many significant digits as possible when decimal precision > greater than 31 > > > Key: SPARK-48439 > URL: https://issues.apache.org/jira/browse/SPARK-48439 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-47260: Assignee: Wei Guo > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Wei Guo >Priority: Minor > Labels: pull-request-available, starter > Fix For: 4.0.0 > > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48318) Hash join support for strings with collation (complex types)
[ https://issues.apache.org/jira/browse/SPARK-48318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48318: -- Assignee: Apache Spark > Hash join support for strings with collation (complex types) > > > Key: SPARK-48318 > URL: https://issues.apache.org/jira/browse/SPARK-48318 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48318) Hash join support for strings with collation (complex types)
[ https://issues.apache.org/jira/browse/SPARK-48318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48318: -- Assignee: (was: Apache Spark) > Hash join support for strings with collation (complex types) > > > Key: SPARK-48318 > URL: https://issues.apache.org/jira/browse/SPARK-48318 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48471) Improve documentation and usage guide for history server
[ https://issues.apache.org/jira/browse/SPARK-48471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48471: Assignee: Kent Yao > Improve documentation and usage guide for history server > > > Key: SPARK-48471 > URL: https://issues.apache.org/jira/browse/SPARK-48471 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48471) Improve documentation and usage guide for history server
[ https://issues.apache.org/jira/browse/SPARK-48471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48471: -- Assignee: (was: Apache Spark) > Improve documentation and usage guide for history server > > > Key: SPARK-48471 > URL: https://issues.apache.org/jira/browse/SPARK-48471 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48471) Improve documentation and usage guide for history server
[ https://issues.apache.org/jira/browse/SPARK-48471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48471: -- Assignee: Apache Spark > Improve documentation and usage guide for history server > > > Key: SPARK-48471 > URL: https://issues.apache.org/jira/browse/SPARK-48471 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48471) Improve documentation and usage guide for history server
[ https://issues.apache.org/jira/browse/SPARK-48471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48471. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46802 [https://github.com/apache/spark/pull/46802] > Improve documentation and usage guide for history server > > > Key: SPARK-48471 > URL: https://issues.apache.org/jira/browse/SPARK-48471 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47260. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46777 [https://github.com/apache/spark/pull/46777] > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > Fix For: 4.0.0 > > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47260: -- Assignee: (was: Apache Spark) > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47260: -- Assignee: Apache Spark > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47260: -- Assignee: Apache Spark > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47260: -- Assignee: (was: Apache Spark) > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47260: -- Assignee: (was: Apache Spark) > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors
[ https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47260: -- Assignee: Apache Spark > Assign classes to Row to JSON errors > - > > Key: SPARK-47260 > URL: https://issues.apache.org/jira/browse/SPARK-47260 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* > defined in {*}core/src/main/resources/error/error-classes.json{*}. The name > should be short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-48415) TypeName support parameterized datatypes
[ https://issues.apache.org/jira/browse/SPARK-48415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reopened SPARK-48415: --- > TypeName support parameterized datatypes > > > Key: SPARK-48415 > URL: https://issues.apache.org/jira/browse/SPARK-48415 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48471) Improve documentation and usage guide for history server
[ https://issues.apache.org/jira/browse/SPARK-48471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48471: --- Labels: pull-request-available (was: ) > Improve documentation and usage guide for history server > > > Key: SPARK-48471 > URL: https://issues.apache.org/jira/browse/SPARK-48471 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48280) Add Expression Walker for Testing
[ https://issues.apache.org/jira/browse/SPARK-48280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48280: --- Labels: pull-request-available (was: ) > Add Expression Walker for Testing > - > > Key: SPARK-48280 > URL: https://issues.apache.org/jira/browse/SPARK-48280 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48472) Expression Walker Test
Mihailo Milosevic created SPARK-48472: - Summary: Expression Walker Test Key: SPARK-48472 URL: https://issues.apache.org/jira/browse/SPARK-48472 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48471) Improve documentation and usage guide for history server
Kent Yao created SPARK-48471: Summary: Improve documentation and usage guide for history server Key: SPARK-48471 URL: https://issues.apache.org/jira/browse/SPARK-48471 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org