[jira] [Updated] (SPARK-48484) V2Write use the same TaskAttemptId for different task attempts

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48484:
---
Labels: pull-request-available  (was: )

> V2Write use the same TaskAttemptId for different task attempts
> --
>
> Key: SPARK-48484
> URL: https://issues.apache.org/jira/browse/SPARK-48484
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48484) V2Write use the same TaskAttemptId for different task attempts

2024-05-30 Thread Yang Jie (Jira)
Yang Jie created SPARK-48484:


 Summary: V2Write use the same TaskAttemptId for different task 
attempts
 Key: SPARK-48484
 URL: https://issues.apache.org/jira/browse/SPARK-48484
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.3, 3.5.1, 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48483) Allow UnsafeExternalSorter to spill when other consumer request memory

2024-05-30 Thread Jin Chengcheng (Jira)
Jin Chengcheng created SPARK-48483:
--

 Summary: Allow UnsafeExternalSorter to spill when other consumer 
request memory
 Key: SPARK-48483
 URL: https://issues.apache.org/jira/browse/SPARK-48483
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
 Environment: Ubuntu
Reporter: Jin Chengcheng
 Fix For: 4.0.0


The downstream Gluten(Native spark engine) meets an OOM exception.

 
{code:java}
24/04/27 11:42:59 ERROR [Executor task launch worker for task 403.0 in stage 
4.0 (TID 91404)] nmm.ManagedReservationListener: Error reserving memory from 
target
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: 
Not enough spark off-heap execution memory. Acquired: 40.0 MiB, granted: 8.0 
MiB. Try tweaking config option spark.memory.offHeap.size to get larger space 
to run this application. 
Current config settings: 
spark.gluten.memory.offHeap.size.in.bytes=50.0 GiB
spark.gluten.memory.task.offHeap.size.in.bytes=12.5 GiB
spark.gluten.memory.conservative.task.offHeap.size.in.bytes=6.3 GiB
Memory consumer stats: 
Task.91404: 
  Current used bytes:   12.5 GiB, peak bytes:N/A
+- 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@a7836d4: 
Current used bytes:   12.4 GiB, peak bytes:N/A
\- Gluten.Tree.194: 
  Current used bytes:   56.0 MiB, peak bytes:   11.7 GiB
   \- root.194: 
  Current used bytes:   56.0 MiB, peak bytes:   11.7 GiB
  +- WholeStageIterator.194:
  Current used bytes:   32.0 MiB, peak bytes:9.0 GiB
  |  \- single: 
  Current used bytes:   23.0 MiB, peak bytes:9.0 GiB
  | +- task.Gluten_Stage_4_TID_91404:   
  Current used bytes:   23.0 MiB, peak bytes:9.0 GiB
  | |  +- node.3:   
  Current used bytes:   21.0 MiB, peak bytes:9.0 GiB
  | |  |  +- op.3.1.0.HashBuild:
  Current used bytes:   10.8 MiB, peak bytes:8.5 GiB
  | |  |  \- op.3.0.0.HashProbe:
  Current used bytes:9.2 MiB, peak bytes:   21.6 MiB
  | |  +- node.5:   
  Current used bytes: 1024.0 KiB, peak bytes:2.0 MiB
  | |  |  \- op.5.0.0.FilterProject:
  Current used bytes:  129.4 KiB, peak bytes: 1232.0 KiB
  | |  +- node.2:   
  Current used bytes: 1024.0 KiB, peak bytes: 1024.0 KiB
  | |  |  \- op.2.1.0.FilterProject:
  Current used bytes:  128.4 KiB, peak bytes:  192.4 KiB
  | |  +- node.1:   
  Current used bytes:  0.0 B, peak bytes:  0.0 B
  | |  |  \- op.1.1.0.ValueStream:  
  Current used bytes:  0.0 B, peak bytes:  0.0 B
  | |  +- node.0:   
  Current used bytes:  0.0 B, peak bytes:  0.0 B
  | |  |  \- op.0.0.0.ValueStream:  
  Current used bytes:  0.0 B, peak bytes:  0.0 B
  | |  \- node.4:   
  Current used bytes:  0.0 B, peak bytes:  0.0 B
  | | \- op.4.0.0.FilterProject:
  Current used bytes:  0.0 B, peak bytes:  0.0 B
  | \- WholeStageIterator_default_leaf: 
  Current used bytes:  0.0 B, peak bytes:  0.0 B
  +- ArrowContextInstance.0:
  Current used bytes:8.0 MiB, peak bytes:8.0 MiB
  +- ColumnarToRow.2:   
  Current used bytes:8.0 MiB, peak bytes:   16.0 MiB
  |  \- single: 
  Current used bytes:6.0 MiB, peak bytes:9.0 MiB
  | \- ColumnarToRow_default_leaf:  
  Current used bytes:6.0 MiB, peak bytes:9.0 MiB
  +- ShuffleReader.3:   
  Current used bytes:8.0 MiB, peak bytes:   16.0 MiB
  |  \- single:  

[jira] [Updated] (SPARK-48483) Allow UnsafeExternalSorter to spill when other consumer requests memory

2024-05-30 Thread Jin Chengcheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jin Chengcheng updated SPARK-48483:
---
Summary: Allow UnsafeExternalSorter to spill when other consumer requests 
memory  (was: Allow UnsafeExternalSorter to spill when other consumer request 
memory)

> Allow UnsafeExternalSorter to spill when other consumer requests memory
> ---
>
> Key: SPARK-48483
> URL: https://issues.apache.org/jira/browse/SPARK-48483
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
> Environment: Ubuntu
>Reporter: Jin Chengcheng
>Priority: Major
> Fix For: 4.0.0
>
>
> The downstream Gluten(Native spark engine) meets an OOM exception.
>  
> {code:java}
> 24/04/27 11:42:59 ERROR [Executor task launch worker for task 403.0 in stage 
> 4.0 (TID 91404)] nmm.ManagedReservationListener: Error reserving memory from 
> target
> org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException:
>  Not enough spark off-heap execution memory. Acquired: 40.0 MiB, granted: 8.0 
> MiB. Try tweaking config option spark.memory.offHeap.size to get larger space 
> to run this application. 
> Current config settings: 
>   spark.gluten.memory.offHeap.size.in.bytes=50.0 GiB
>   spark.gluten.memory.task.offHeap.size.in.bytes=12.5 GiB
>   spark.gluten.memory.conservative.task.offHeap.size.in.bytes=6.3 GiB
> Memory consumer stats: 
>   Task.91404: 
>   Current used bytes:   12.5 GiB, peak bytes:N/A
>   +- 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@a7836d4: 
> Current used bytes:   12.4 GiB, peak bytes:N/A
>   \- Gluten.Tree.194: 
>   Current used bytes:   56.0 MiB, peak bytes:   11.7 GiB
>  \- root.194: 
>   Current used bytes:   56.0 MiB, peak bytes:   11.7 GiB
> +- WholeStageIterator.194:
>   Current used bytes:   32.0 MiB, peak bytes:9.0 GiB
> |  \- single: 
>   Current used bytes:   23.0 MiB, peak bytes:9.0 GiB
> | +- task.Gluten_Stage_4_TID_91404:   
>   Current used bytes:   23.0 MiB, peak bytes:9.0 GiB
> | |  +- node.3:   
>   Current used bytes:   21.0 MiB, peak bytes:9.0 GiB
> | |  |  +- op.3.1.0.HashBuild:
>   Current used bytes:   10.8 MiB, peak bytes:8.5 GiB
> | |  |  \- op.3.0.0.HashProbe:
>   Current used bytes:9.2 MiB, peak bytes:   21.6 MiB
> | |  +- node.5:   
>   Current used bytes: 1024.0 KiB, peak bytes:2.0 MiB
> | |  |  \- op.5.0.0.FilterProject:
>   Current used bytes:  129.4 KiB, peak bytes: 1232.0 KiB
> | |  +- node.2:   
>   Current used bytes: 1024.0 KiB, peak bytes: 1024.0 KiB
> | |  |  \- op.2.1.0.FilterProject:
>   Current used bytes:  128.4 KiB, peak bytes:  192.4 KiB
> | |  +- node.1:   
>   Current used bytes:  0.0 B, peak bytes:  0.0 B
> | |  |  \- op.1.1.0.ValueStream:  
>   Current used bytes:  0.0 B, peak bytes:  0.0 B
> | |  +- node.0:   
>   Current used bytes:  0.0 B, peak bytes:  0.0 B
> | |  |  \- op.0.0.0.ValueStream:  
>   Current used bytes:  0.0 B, peak bytes:  0.0 B
> | |  \- node.4:   
>   Current used bytes:  0.0 B, peak bytes:  0.0 B
> | | \- op.4.0.0.FilterProject:
>   Current used bytes:  0.0 B, peak bytes:  0.0 B
> | \- WholeStageIterator_default_leaf: 
>   Current used bytes:  0.0 B, peak bytes:  0.0 B
> +- ArrowContextInstance.0:
>   Current used bytes:8.0 MiB, peak bytes:8.0 MiB
> +- ColumnarToRow.2:   
>   Current used bytes:8.0 MiB, peak bytes:   16.0 MiB
> |  

[jira] [Updated] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48482:
---
Labels: pull-request-available  (was: )

> dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
> --
>
> Key: SPARK-48482
> URL: https://issues.apache.org/jira/browse/SPARK-48482
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs

2024-05-30 Thread Wei Liu (Jira)
Wei Liu created SPARK-48482:
---

 Summary: dropDuplicates and dropDuplicatesWithinWatermark should 
accept varargs
 Key: SPARK-48482
 URL: https://issues.apache.org/jira/browse/SPARK-48482
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48476) NPE thrown when delimiter set to null in CSV

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48476:
---
Labels: pull-request-available  (was: )

> NPE thrown when delimiter set to null in CSV
> 
>
> Key: SPARK-48476
> URL: https://issues.apache.org/jira/browse/SPARK-48476
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Milan Stefanovic
>Priority: Major
>  Labels: pull-request-available
>
> When customers specified delimiter to null, currently we throw NPE. We should 
> throw customer facing error
> repro:
> spark.read.format("csv")
> .option("delimiter", null)
> .load()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48474:


Assignee: BingKun Pan

> Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
> ---
>
> Key: SPARK-48474
> URL: https://issues.apache.org/jira/browse/SPARK-48474
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48474.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46808
[https://github.com/apache/spark/pull/46808]

> Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
> ---
>
> Key: SPARK-48474
> URL: https://issues.apache.org/jira/browse/SPARK-48474
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48467) Upgrade Maven to 3.9.7

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48467:


Assignee: BingKun Pan

> Upgrade Maven to 3.9.7
> --
>
> Key: SPARK-48467
> URL: https://issues.apache.org/jira/browse/SPARK-48467
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48467) Upgrade Maven to 3.9.7

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48467.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46798
[https://github.com/apache/spark/pull/46798]

> Upgrade Maven to 3.9.7
> --
>
> Key: SPARK-48467
> URL: https://issues.apache.org/jira/browse/SPARK-48467
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47716:


Assignee: Jack Chen

> SQLQueryTestSuite flaky case due to view name conflict
> --
>
> Key: SPARK-47716
> URL: https://issues.apache.org/jira/browse/SPARK-47716
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
>  Labels: pull-request-available
>
> In SQLQueryTestSuite, the test case "Test logic for determining whether a 
> query is semantically sorted" can sometimes fail with an error
> {{Cannot create table or view `main`.`default`.`t1` because it already 
> exists.}}
> if run concurrently with other sql test cases that also create tables with 
> the same name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47716.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45855
[https://github.com/apache/spark/pull/45855]

> SQLQueryTestSuite flaky case due to view name conflict
> --
>
> Key: SPARK-47716
> URL: https://issues.apache.org/jira/browse/SPARK-47716
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> In SQLQueryTestSuite, the test case "Test logic for determining whether a 
> query is semantically sorted" can sometimes fail with an error
> {{Cannot create table or view `main`.`default`.`t1` because it already 
> exists.}}
> if run concurrently with other sql test cases that also create tables with 
> the same name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48419) Foldable propagation replace foldable column should use origin column

2024-05-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48419:
---

Assignee: KnightChess

> Foldable propagation replace foldable column should use origin column
> -
>
> Key: SPARK-48419
> URL: https://issues.apache.org/jira/browse/SPARK-48419
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.3, 3.2.4, 4.0.0, 3.5.1, 3.3.4
>Reporter: KnightChess
>Assignee: KnightChess
>Priority: Major
>  Labels: pull-request-available
>
> column name will be change by `FoldablePropagation` in optimizer
> befor optimizer:
> ```shell
> 'Project ['x, 'y, 'z]
> +- 'Project ['a AS x#112, str AS Y#113, 'b AS z#114]
>    +- LocalRelation , [a#0, b#1]
> ```
> after optimizer:
> ```shell
> Project [x#112, str AS Y#113, z#114]
> +- Project [a#0 AS x#112, str AS Y#113, b#1 AS z#114]
>    +- LocalRelation , [a#0, b#1]
> ```
> column name `y` will be replace to 'Y'



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48419) Foldable propagation replace foldable column should use origin column

2024-05-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48419.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46742
[https://github.com/apache/spark/pull/46742]

> Foldable propagation replace foldable column should use origin column
> -
>
> Key: SPARK-48419
> URL: https://issues.apache.org/jira/browse/SPARK-48419
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.3, 3.2.4, 4.0.0, 3.5.1, 3.3.4
>Reporter: KnightChess
>Assignee: KnightChess
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> column name will be change by `FoldablePropagation` in optimizer
> befor optimizer:
> ```shell
> 'Project ['x, 'y, 'z]
> +- 'Project ['a AS x#112, str AS Y#113, 'b AS z#114]
>    +- LocalRelation , [a#0, b#1]
> ```
> after optimizer:
> ```shell
> Project [x#112, str AS Y#113, z#114]
> +- Project [a#0 AS x#112, str AS Y#113, b#1 AS z#114]
>    +- LocalRelation , [a#0, b#1]
> ```
> column name `y` will be replace to 'Y'



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48461) Replace NullPointerExceptions with proper error classes in AssertNotNull expression

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48461.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46793
[https://github.com/apache/spark/pull/46793]

> Replace NullPointerExceptions with proper error classes in AssertNotNull 
> expression
> ---
>
> Key: SPARK-48461
> URL: https://issues.apache.org/jira/browse/SPARK-48461
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> [Code location 
> here|https://github.com/apache/spark/blob/f5d9b809881552c0e1b5af72b2a32caa25018eb3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L1929]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48480) StreamingQueryListener thread should not be interruptable

2024-05-30 Thread Wei Liu (Jira)
Wei Liu created SPARK-48480:
---

 Summary: StreamingQueryListener thread should not be interruptable
 Key: SPARK-48480
 URL: https://issues.apache.org/jira/browse/SPARK-48480
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SS
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48446:


Assignee: Yuchen Liu

> Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
> --
>
> Key: SPARK-48446
> URL: https://issues.apache.org/jira/browse/SPARK-48446
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yuchen Liu
>Assignee: Yuchen Liu
>Priority: Minor
>  Labels: easyfix, pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For dropDuplicates, the example on 
> [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22]
>  is out of date compared with 
> [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html].
>  The argument should be a list.
> The discrepancy is also true for dropDuplicatesWithinWatermark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48479) Support creating SQL functions in parser

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48479:
---
Labels: pull-request-available  (was: )

> Support creating SQL functions in parser
> 
>
> Key: SPARK-48479
> URL: https://issues.apache.org/jira/browse/SPARK-48479
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Add Spark SQL parser for creating SQL functions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48446.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46797
[https://github.com/apache/spark/pull/46797]

> Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
> --
>
> Key: SPARK-48446
> URL: https://issues.apache.org/jira/browse/SPARK-48446
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yuchen Liu
>Assignee: Yuchen Liu
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Fix For: 4.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For dropDuplicates, the example on 
> [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22]
>  is out of date compared with 
> [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html].
>  The argument should be a list.
> The discrepancy is also true for dropDuplicatesWithinWatermark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48475) Optimize _get_jvm_function in PySpark.

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48475.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46809
[https://github.com/apache/spark/pull/46809]

> Optimize _get_jvm_function in PySpark.
> --
>
> Key: SPARK-48475
> URL: https://issues.apache.org/jira/browse/SPARK-48475
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48479) Support creating SQL functions in parser

2024-05-30 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-48479:
-
Summary: Support creating SQL functions in parser  (was: Support ccreating 
SQL functions in parser)

> Support creating SQL functions in parser
> 
>
> Key: SPARK-48479
> URL: https://issues.apache.org/jira/browse/SPARK-48479
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> Add Spark SQL parser for creating SQL functions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48479) Support ccreating SQL functions in parser

2024-05-30 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-48479:
-
Summary: Support ccreating SQL functions in parser  (was: Add support for 
creating SQL functions in parser)

> Support ccreating SQL functions in parser
> -
>
> Key: SPARK-48479
> URL: https://issues.apache.org/jira/browse/SPARK-48479
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> Add Spark SQL parser for creating SQL functions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48465) Avoid no-op empty relation propagation in AQE

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48465:
---
Labels: pull-request-available  (was: )

> Avoid no-op empty relation propagation in AQE
> -
>
> Key: SPARK-48465
> URL: https://issues.apache.org/jira/browse/SPARK-48465
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ziqi Liu
>Priority: Major
>  Labels: pull-request-available
>
> We should avoid no-op empty relation propagation in AQE: if we convert an 
> empty QueryStageExec to empty relation, it will further wrapped into a new 
> query stage and execute -> produce empty result -> empty relation propagation 
> again. This issue is currently not exposed because AQE will try to reuse 
> shuffle.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48479) Add support for creating SQL functions in parser

2024-05-30 Thread Allison Wang (Jira)
Allison Wang created SPARK-48479:


 Summary: Add support for creating SQL functions in parser
 Key: SPARK-48479
 URL: https://issues.apache.org/jira/browse/SPARK-48479
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Allison Wang


Add Spark SQL parser for creating SQL functions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48468) Add LogicalQueryStage interface in catalyst

2024-05-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48468.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Add LogicalQueryStage interface in catalyst
> ---
>
> Key: SPARK-48468
> URL: https://issues.apache.org/jira/browse/SPARK-48468
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ziqi Liu
>Assignee: Ziqi Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add `LogicalQueryStage` interface in catalyst so that it's visible in logical 
> rules



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48478) Allow passing iterator of PyArrow RecordBatches to createDataFrame()

2024-05-30 Thread Ian Cook (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850865#comment-17850865
 ] 

Ian Cook commented on SPARK-48478:
--

For Connect, see class {{LocalRelation}} in 
{{{}python/pyspark/sql/connect/plan.py{}}}. Something similar could be used to 
create a local relation from an iterator of RecordBatches. (But do we need to 
create this as a cached remote relation? Creating it locally will just fill up 
client memory I think)

For Classic, see {{{}_create_from_arrow_table{}}}. Something similar could be 
used to create a DataFrame from an iterator of RecordBatches.

> Allow passing iterator of PyArrow RecordBatches to createDataFrame()
> 
>
> Key: SPARK-48478
> URL: https://issues.apache.org/jira/browse/SPARK-48478
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Input/Output, PySpark, SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Priority: Major
>
> As a follow-up to SPARK-48220:
> For larger data, it would be nice to be able to pass an iterator of PyArrow 
> RecordBatches to {{{}createDataFrame(){}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48468) Add LogicalQueryStage interface in catalyst

2024-05-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48468:
---

Assignee: Ziqi Liu

> Add LogicalQueryStage interface in catalyst
> ---
>
> Key: SPARK-48468
> URL: https://issues.apache.org/jira/browse/SPARK-48468
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ziqi Liu
>Assignee: Ziqi Liu
>Priority: Major
>  Labels: pull-request-available
>
> Add `LogicalQueryStage` interface in catalyst so that it's visible in logical 
> rules



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48477) Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite

2024-05-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48477.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite
> --
>
> Key: SPARK-48477
> URL: https://issues.apache.org/jira/browse/SPARK-48477
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48478) Allow passing iterator of PyArrow RecordBatches to createDataFrame()

2024-05-30 Thread Ian Cook (Jira)
Ian Cook created SPARK-48478:


 Summary: Allow passing iterator of PyArrow RecordBatches to 
createDataFrame()
 Key: SPARK-48478
 URL: https://issues.apache.org/jira/browse/SPARK-48478
 Project: Spark
  Issue Type: Improvement
  Components: Connect, Input/Output, PySpark, SQL
Affects Versions: 3.5.1, 4.0.0
Reporter: Ian Cook


As a follow-up to SPARK-48220:

For larger data, it would be nice to be able to pass an iterator of PyArrow 
RecordBatches to {{{}createDataFrame(){}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48478) Allow passing iterator of PyArrow RecordBatches to createDataFrame()

2024-05-30 Thread Ian Cook (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook updated SPARK-48478:
-
Language: Python

> Allow passing iterator of PyArrow RecordBatches to createDataFrame()
> 
>
> Key: SPARK-48478
> URL: https://issues.apache.org/jira/browse/SPARK-48478
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Input/Output, PySpark, SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Priority: Major
>
> As a follow-up to SPARK-48220:
> For larger data, it would be nice to be able to pass an iterator of PyArrow 
> RecordBatches to {{{}createDataFrame(){}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47466) Add PySpark DataFrame method to return iterator of PyArrow RecordBatches

2024-05-30 Thread Ian Cook (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850855#comment-17850855
 ] 

Ian Cook commented on SPARK-47466:
--

For Connect, see the function {{to_table_as_iterator}} in 
{{python/pyspark/sql/connect/client/core.py}}. To return an iterator of 
RecordBatches we could add another function similar to that.

For Classic, see the function {{_collect_as_arrow}} in 
{{python/pyspark/sql/pandas/conversion.py}}. To return an iterator of 
RecordBatches we could add another function similar to that.

> Add PySpark DataFrame method to return iterator of PyArrow RecordBatches
> 
>
> Key: SPARK-47466
> URL: https://issues.apache.org/jira/browse/SPARK-47466
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.1
>Reporter: Ian Cook
>Priority: Major
>
> As a follow-up to SPARK-47365:
> {{toArrow()}} is useful when the data is relatively small. For larger data, 
> the best way to return the contents of a PySpark DataFrame in Arrow format is 
> to return an iterator of [PyArrow 
> RecordBatches|https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48477) Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite

2024-05-30 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated SPARK-48477:
-
Summary: Refactor CollationSuite, CoalesceShufflePartitionsSuite, 
SQLExecutionSuite  (was: Refactor CollationSuite, 
CoalesceShufflePartitionsSuite, SQLExecutionSuite, HivePlanTest)

> Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite
> --
>
> Key: SPARK-48477
> URL: https://issues.apache.org/jira/browse/SPARK-48477
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48477) Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite, HivePlanTest

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48477:
---
Labels: pull-request-available  (was: )

> Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite, 
> HivePlanTest
> 
>
> Key: SPARK-48477
> URL: https://issues.apache.org/jira/browse/SPARK-48477
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48477) Refactor CollationSuite, CoalesceShufflePartitionsSuite, SQLExecutionSuite, HivePlanTest

2024-05-30 Thread Rui Wang (Jira)
Rui Wang created SPARK-48477:


 Summary: Refactor CollationSuite, CoalesceShufflePartitionsSuite, 
SQLExecutionSuite, HivePlanTest
 Key: SPARK-48477
 URL: https://issues.apache.org/jira/browse/SPARK-48477
 Project: Spark
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 4.0.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48000) Hash join support for strings with collation (StringType only)

2024-05-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-48000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850809#comment-17850809
 ] 

Uroš Bojanić commented on SPARK-48000:
--

[~hudson] that one is no longer relevant - closed it, thanks

> Hash join support for strings with collation (StringType only)
> --
>
> Key: SPARK-48000
> URL: https://issues.apache.org/jira/browse/SPARK-48000
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48008) Support UDAF in Spark Connect

2024-05-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-48008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-48008:
-

Assignee: Pengfei Xu

> Support UDAF in Spark Connect
> -
>
> Key: SPARK-48008
> URL: https://issues.apache.org/jira/browse/SPARK-48008
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Pengfei Xu
>Assignee: Pengfei Xu
>Priority: Major
>  Labels: pull-request-available
>
> Currently Spark Connect supports only UDFs. We need to add support for UDAFs, 
> specifically `Aggregator[INT, BUF, OUT]`.
> The user-facing API should not change, which includes Aggregator methods and 
> the `spark.udf.register("agg", udaf(agg))` API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48008) Support UDAF in Spark Connect

2024-05-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-48008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-48008.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> Support UDAF in Spark Connect
> -
>
> Key: SPARK-48008
> URL: https://issues.apache.org/jira/browse/SPARK-48008
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Pengfei Xu
>Assignee: Pengfei Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently Spark Connect supports only UDFs. We need to add support for UDAFs, 
> specifically `Aggregator[INT, BUF, OUT]`.
> The user-facing API should not change, which includes Aggregator methods and 
> the `spark.udf.register("agg", udaf(agg))` API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48292) Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status

2024-05-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48292.
-
Fix Version/s: 4.0.0
 Assignee: angerszhu  (was: L. C. Hsieh)
   Resolution: Fixed

> Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage 
> when committed file not consistent with task status
> --
>
> Key: SPARK-48292
> URL: https://issues.apache.org/jira/browse/SPARK-48292
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: L. C. Hsieh
>Assignee: angerszhu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When a task attemp fails but it is authorized to do task commit, 
> OutputCommitCoordinator will make the stage failed with a reason message 
> which says that task commit success, but actually the driver never knows if a 
> task commit is successful or not. We should update the reason message to make 
> it less confused.
> See https://github.com/apache/spark/pull/36564#discussion_r1598660630



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48000) Hash join support for strings with collation (StringType only)

2024-05-30 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850805#comment-17850805
 ] 

Hudson commented on SPARK-48000:


User 'uros-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/46166

> Hash join support for strings with collation (StringType only)
> --
>
> Key: SPARK-48000
> URL: https://issues.apache.org/jira/browse/SPARK-48000
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48292) Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status

2024-05-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48292:
---

Assignee: L. C. Hsieh

> Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage 
> when committed file not consistent with task status
> --
>
> Key: SPARK-48292
> URL: https://issues.apache.org/jira/browse/SPARK-48292
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Minor
>  Labels: pull-request-available
>
> When a task attemp fails but it is authorized to do task commit, 
> OutputCommitCoordinator will make the stage failed with a reason message 
> which says that task commit success, but actually the driver never knows if a 
> task commit is successful or not. We should update the reason message to make 
> it less confused.
> See https://github.com/apache/spark/pull/36564#discussion_r1598660630



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48476) NPE thrown when delimiter set to null in CSV

2024-05-30 Thread Milan Stefanovic (Jira)
Milan Stefanovic created SPARK-48476:


 Summary: NPE thrown when delimiter set to null in CSV
 Key: SPARK-48476
 URL: https://issues.apache.org/jira/browse/SPARK-48476
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Milan Stefanovic


When customers specified delimiter to null, currently we throw NPE. We should 
throw customer facing error

repro:

spark.read.format("csv")
.option("delimiter", null)
.load()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48474:
---
Labels: pull-request-available  (was: )

> Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
> ---
>
> Key: SPARK-48474
> URL: https://issues.apache.org/jira/browse/SPARK-48474
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`

2024-05-30 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-48474:
---

 Summary: Fix the class name of the log in `SparkSubmitArguments` & 
`SparkSubmit`
 Key: SPARK-48474
 URL: https://issues.apache.org/jira/browse/SPARK-48474
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48473) Add extensible trait to allow-list non-deterministic expressions in operators in CheckAnalysis

2024-05-30 Thread Carmen Kwan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carmen Kwan updated SPARK-48473:

Component/s: SQL
 (was: Spark Core)

> Add extensible trait to allow-list non-deterministic expressions in operators 
> in CheckAnalysis
> --
>
> Key: SPARK-48473
> URL: https://issues.apache.org/jira/browse/SPARK-48473
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Carmen Kwan
>Priority: Major
>
> CheckAnalysis throws an `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception 
> when there is a non-deterministic expression within an operator that is not 
> allow listed in the case match check 
> [below|https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L773-L784]:
>  
> {code:java}
>  case o if o.expressions.exists(!_.deterministic) &&
>             !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
>             !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] &&
>             !o.isInstanceOf[Expand] &&
>             !o.isInstanceOf[Generate] &&
>             // Lateral join is checked in checkSubqueryExpression.
>             !o.isInstanceOf[LateralJoin] =>
>             // The rule above is used to check Aggregate operator.
>             o.failAnalysis(
>               errorClass = "INVALID_NON_DETERMINISTIC_EXPRESSIONS",
>               messageParameters = Map("sqlExprs" -> 
> o.expressions.map(toSQLExpr(_)).mkString(", "))
>             ){code}
>  
> It would be nice to add a generic trait/class to this case match that is 
> allow listed so that when new non-deterministic expressions that live in 
> other repositories needs to be allow listed, we don't need to wait for a new 
> spark release. For example, in Delta Lake, we want to allow list a specific 
> non-deterministic expression for the DeltaMergeIntoMatchedUpdateClause 
> operator as part of Delta's [Identity Column 
> implementation.|https://github.com/delta-io/delta/issues/1959]It is cleaner 
> overall to add an abstract generic class there than to put Delta specific 
> logic into this CheckAnalysis rule.  
> It would be beneficial to backport this to Spark 3.5 so that we don't need to 
> wait for the Spark 4 to benefit from this low risk change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42252) Deprecate spark.shuffle.unsafe.file.output.buffer and add a new config

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42252:
---
Labels: pull-request-available  (was: )

> Deprecate spark.shuffle.unsafe.file.output.buffer and add a new config
> --
>
> Key: SPARK-42252
> URL: https://issues.apache.org/jira/browse/SPARK-42252
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0
>Reporter: Wei Guo
>Priority: Minor
>  Labels: pull-request-available
>
> After Jira SPARK-28209 and PR 
> [25007|[https://github.com/apache/spark/pull/25007]], the new shuffle writer 
> api is proposed. All shuffle writers(BypassMergeSortShuffleWriter, 
> SortShuffleWriter, UnsafeShuffleWriter) are based on 
> LocalDiskShuffleMapOutputWriter to write local disk shuffle files. The config 
> spark.shuffle.unsafe.file.output.buffer used in 
> LocalDiskShuffleMapOutputWriter was only used in UnsafeShuffleWriter before. 
>  
> It's better to rename it and make it more suitable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48473) Add extensible trait to allow-list non-deterministic expressions in operators in CheckAnalysis

2024-05-30 Thread Carmen Kwan (Jira)
Carmen Kwan created SPARK-48473:
---

 Summary: Add extensible trait to allow-list non-deterministic 
expressions in operators in CheckAnalysis
 Key: SPARK-48473
 URL: https://issues.apache.org/jira/browse/SPARK-48473
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0, 3.5.2
Reporter: Carmen Kwan


CheckAnalysis throws an `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception when 
there is a non-deterministic expression within an operator that is not allow 
listed in the case match check 
[below|https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L773-L784]:
 
{code:java}
 case o if o.expressions.exists(!_.deterministic) &&
            !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
            !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] &&
            !o.isInstanceOf[Expand] &&
            !o.isInstanceOf[Generate] &&
            // Lateral join is checked in checkSubqueryExpression.
            !o.isInstanceOf[LateralJoin] =>
            // The rule above is used to check Aggregate operator.
            o.failAnalysis(
              errorClass = "INVALID_NON_DETERMINISTIC_EXPRESSIONS",
              messageParameters = Map("sqlExprs" -> 
o.expressions.map(toSQLExpr(_)).mkString(", "))
            ){code}
 

It would be nice to add a generic trait/class to this case match that is allow 
listed so that when new non-deterministic expressions that live in other 
repositories needs to be allow listed, we don't need to wait for a new spark 
release. For example, in Delta Lake, we want to allow list a specific 
non-deterministic expression for the DeltaMergeIntoMatchedUpdateClause operator 
as part of Delta's [Identity Column 
implementation.|https://github.com/delta-io/delta/issues/1959]It is cleaner 
overall to add an abstract generic class there than to put Delta specific logic 
into this CheckAnalysis rule.  

It would be beneficial to backport this to Spark 3.5 so that we don't need to 
wait for the Spark 4 to benefit from this low risk change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48439) Derby: Retain as many significant digits as possible when decimal precision greater than 31

2024-05-30 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-48439:


Assignee: Kent Yao

> Derby: Retain as many significant digits as possible when decimal precision 
> greater than 31 
> 
>
> Key: SPARK-48439
> URL: https://issues.apache.org/jira/browse/SPARK-48439
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48439) Derby: Retain as many significant digits as possible when decimal precision greater than 31

2024-05-30 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-48439.
--
Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/46776

> Derby: Retain as many significant digits as possible when decimal precision 
> greater than 31 
> 
>
> Key: SPARK-48439
> URL: https://issues.apache.org/jira/browse/SPARK-48439
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors

2024-05-30 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-47260:


Assignee: Wei Guo

>  Assign classes to Row to JSON errors
> -
>
> Key: SPARK-47260
> URL: https://issues.apache.org/jira/browse/SPARK-47260
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Wei Guo
>Priority: Minor
>  Labels: pull-request-available, starter
> Fix For: 4.0.0
>
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* 
> defined in {*}core/src/main/resources/error/error-classes.json{*}. The name 
> should be short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48318) Hash join support for strings with collation (complex types)

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48318:
--

Assignee: Apache Spark

> Hash join support for strings with collation (complex types)
> 
>
> Key: SPARK-48318
> URL: https://issues.apache.org/jira/browse/SPARK-48318
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48318) Hash join support for strings with collation (complex types)

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48318:
--

Assignee: (was: Apache Spark)

> Hash join support for strings with collation (complex types)
> 
>
> Key: SPARK-48318
> URL: https://issues.apache.org/jira/browse/SPARK-48318
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48471) Improve documentation and usage guide for history server

2024-05-30 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48471:


Assignee: Kent Yao

> Improve documentation and usage guide for history server
> 
>
> Key: SPARK-48471
> URL: https://issues.apache.org/jira/browse/SPARK-48471
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48471) Improve documentation and usage guide for history server

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48471:
--

Assignee: (was: Apache Spark)

> Improve documentation and usage guide for history server
> 
>
> Key: SPARK-48471
> URL: https://issues.apache.org/jira/browse/SPARK-48471
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48471) Improve documentation and usage guide for history server

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48471:
--

Assignee: Apache Spark

> Improve documentation and usage guide for history server
> 
>
> Key: SPARK-48471
> URL: https://issues.apache.org/jira/browse/SPARK-48471
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48471) Improve documentation and usage guide for history server

2024-05-30 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48471.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46802
[https://github.com/apache/spark/pull/46802]

> Improve documentation and usage guide for history server
> 
>
> Key: SPARK-48471
> URL: https://issues.apache.org/jira/browse/SPARK-48471
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47260) Assign classes to Row to JSON errors

2024-05-30 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-47260.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46777
[https://github.com/apache/spark/pull/46777]

>  Assign classes to Row to JSON errors
> -
>
> Key: SPARK-47260
> URL: https://issues.apache.org/jira/browse/SPARK-47260
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
> Fix For: 4.0.0
>
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* 
> defined in {*}core/src/main/resources/error/error-classes.json{*}. The name 
> should be short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47260:
--

Assignee: (was: Apache Spark)

>  Assign classes to Row to JSON errors
> -
>
> Key: SPARK-47260
> URL: https://issues.apache.org/jira/browse/SPARK-47260
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* 
> defined in {*}core/src/main/resources/error/error-classes.json{*}. The name 
> should be short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47260:
--

Assignee: Apache Spark

>  Assign classes to Row to JSON errors
> -
>
> Key: SPARK-47260
> URL: https://issues.apache.org/jira/browse/SPARK-47260
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* 
> defined in {*}core/src/main/resources/error/error-classes.json{*}. The name 
> should be short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47260:
--

Assignee: Apache Spark

>  Assign classes to Row to JSON errors
> -
>
> Key: SPARK-47260
> URL: https://issues.apache.org/jira/browse/SPARK-47260
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* 
> defined in {*}core/src/main/resources/error/error-classes.json{*}. The name 
> should be short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47260:
--

Assignee: (was: Apache Spark)

>  Assign classes to Row to JSON errors
> -
>
> Key: SPARK-47260
> URL: https://issues.apache.org/jira/browse/SPARK-47260
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* 
> defined in {*}core/src/main/resources/error/error-classes.json{*}. The name 
> should be short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47260:
--

Assignee: (was: Apache Spark)

>  Assign classes to Row to JSON errors
> -
>
> Key: SPARK-47260
> URL: https://issues.apache.org/jira/browse/SPARK-47260
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* 
> defined in {*}core/src/main/resources/error/error-classes.json{*}. The name 
> should be short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47260) Assign classes to Row to JSON errors

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47260:
--

Assignee: Apache Spark

>  Assign classes to Row to JSON errors
> -
>
> Key: SPARK-47260
> URL: https://issues.apache.org/jira/browse/SPARK-47260
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_32[49-51]* 
> defined in {*}core/src/main/resources/error/error-classes.json{*}. The name 
> should be short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-48415) TypeName support parameterized datatypes

2024-05-30 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reopened SPARK-48415:
---

> TypeName support parameterized datatypes
> 
>
> Key: SPARK-48415
> URL: https://issues.apache.org/jira/browse/SPARK-48415
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48471) Improve documentation and usage guide for history server

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48471:
---
Labels: pull-request-available  (was: )

> Improve documentation and usage guide for history server
> 
>
> Key: SPARK-48471
> URL: https://issues.apache.org/jira/browse/SPARK-48471
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48280) Add Expression Walker for Testing

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48280:
---
Labels: pull-request-available  (was: )

> Add Expression Walker for Testing
> -
>
> Key: SPARK-48280
> URL: https://issues.apache.org/jira/browse/SPARK-48280
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48472) Expression Walker Test

2024-05-30 Thread Mihailo Milosevic (Jira)
Mihailo Milosevic created SPARK-48472:
-

 Summary: Expression Walker Test
 Key: SPARK-48472
 URL: https://issues.apache.org/jira/browse/SPARK-48472
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Mihailo Milosevic






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48471) Improve documentation and usage guide for history server

2024-05-30 Thread Kent Yao (Jira)
Kent Yao created SPARK-48471:


 Summary: Improve documentation and usage guide for history server
 Key: SPARK-48471
 URL: https://issues.apache.org/jira/browse/SPARK-48471
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org