[jira] [Updated] (SPARK-49629) JsonProtocol should write 'Shuffle Push Read Metrics' and 'Merged Fallback Count' field of shuffle read metrics

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49629:
---
Labels: pull-request-available  (was: )

> JsonProtocol should write 'Shuffle Push Read Metrics' and 'Merged Fallback 
> Count' field of shuffle read metrics
> ---
>
> Key: SPARK-49629
> URL: https://issues.apache.org/jira/browse/SPARK-49629
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
>
> `JsonProtocol` writes 'Push Based Shuffle' and 'Merged Fetch Fallback Count' 
> field of shuffle read metrics at present, which is inconsistent with fields 
> of task metric from json. Therefore,  `JsonProtocol` should write 'Shuffle 
> Push Read Metrics' and 'Merged Fallback Count' field of shuffle read metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49625) Spark Cluster Happy Path State Transition Test

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49625:
---
Labels: pull-request-available  (was: )

> Spark Cluster Happy Path State Transition Test
> --
>
> Key: SPARK-49625
> URL: https://issues.apache.org/jira/browse/SPARK-49625
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Qi Tan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49627) Run SortMergeJoin in batch

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49627:
---
Labels: pull-request-available  (was: )

> Run SortMergeJoin in batch
> --
>
> Key: SPARK-49627
> URL: https://issues.apache.org/jira/browse/SPARK-49627
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: t2.snappy.parquet
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49626) Support horizontal and vertical bar plots

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49626:
---
Labels: pull-request-available  (was: )

> Support horizontal and vertical bar plots
> -
>
> Key: SPARK-49626
> URL: https://issues.apache.org/jira/browse/SPARK-49626
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Support horizontal and vertical bar plot



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43354) Re-enable test_create_dataframe_from_pandas_with_day_time_interval

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43354:
---
Labels: pull-request-available  (was: )

> Re-enable test_create_dataframe_from_pandas_with_day_time_interval
> --
>
> Key: SPARK-43354
> URL: https://issues.apache.org/jira/browse/SPARK-43354
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> This test fails with PyPy 3.8.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49623) Rename prefix `appResources` in helm chart

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49623:
---
Labels: pull-request-available  (was: )

> Rename prefix `appResources` in helm chart
> --
>
> Key: SPARK-49623
> URL: https://issues.apache.org/jira/browse/SPARK-49623
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49621) Disable a flaky `EXEC IMMEDIATE STACK OVERFLOW` test case

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49621:
---
Labels: pull-request-available  (was: )

> Disable a flaky `EXEC IMMEDIATE STACK OVERFLOW` test case
> -
>
> Key: SPARK-49621
> URL: https://issues.apache.org/jira/browse/SPARK-49621
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49620) Fix `spark-rm` and `infra` docker files to create `pypy3.9` links

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49620:
---
Labels: pull-request-available  (was: )

> Fix `spark-rm` and `infra` docker files to create `pypy3.9` links
> -
>
> Key: SPARK-49620
> URL: https://issues.apache.org/jira/browse/SPARK-49620
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49618) Union ( & UnionExec) nodes equality not take into account unaligned positions of branches causing NO ( reuse of exchange and cached plans)

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49618:
---
Labels: pull-request-available  (was: )

> Union ( & UnionExec) nodes equality not take into account unaligned positions 
> of branches causing NO ( reuse of exchange and cached plans)
> --
>
> Key: SPARK-49618
> URL: https://issues.apache.org/jira/browse/SPARK-49618
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.3.4, 3.4.3
>Reporter: Asif
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.4
>
>
> Ideally a Union( plan1, plan2) and Union (plan2, plan1) are logically equal, 
> so long as the output attributes of plan1 and plan2 are matching in terms of 
> name , data type, metdadata etc ( though differing in terms of exprId).
> But because current equality and hashCode is dependent on the order of the 
> children, the canonicalizations do not match.
> This causes reuse of exchange not happening  in following situations:
>  
> Exchange 1. =  Union( plan1, plan2)
> Exchange 2 =  Union( plan2, plan1)
>  
> similarly the cached lookup also misses picking the InMemoryRelation.
>  
> Will be submitting a PR and bug tests for the above scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49619) Upgrade Gradle to 8.10.1

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49619:
---
Labels: pull-request-available  (was: )

> Upgrade Gradle to 8.10.1
> 
>
> Key: SPARK-49619
> URL: https://issues.apache.org/jira/browse/SPARK-49619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49561) PIVOT + UNPIVOT operators

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49561:
---
Labels: pull-request-available  (was: )

> PIVOT + UNPIVOT operators
> -
>
> Key: SPARK-49561
> URL: https://issues.apache.org/jira/browse/SPARK-49561
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49557) WHERE operator

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49557:
---
Labels: pull-request-available  (was: )

> WHERE operator
> --
>
> Key: SPARK-49557
> URL: https://issues.apache.org/jira/browse/SPARK-49557
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49610) Use plan ID as the session-local plan cache key type

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49610:
---
Labels: pull-request-available  (was: )

> Use plan ID as the session-local plan cache key type
> 
>
> Key: SPARK-49610
> URL: https://issues.apache.org/jira/browse/SPARK-49610
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Changgyoo Park
>Priority: Major
>  Labels: pull-request-available
>
> Comparing protobuf messages is sometimes very expensive if the message is 
> very large.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49611) Introduce TVF all_collations()

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49611:
---
Labels: pull-request-available  (was: )

> Introduce TVF all_collations()
> --
>
> Key: SPARK-49611
> URL: https://issues.apache.org/jira/browse/SPARK-49611
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49597) Support non-column arguments in UDTF for simpler usage

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49597:
--

Assignee: (was: Apache Spark)

> Support non-column arguments in UDTF for simpler usage
> --
>
> Key: SPARK-49597
> URL: https://issues.apache.org/jira/browse/SPARK-49597
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Currently UDTF only can accept column argument but users might feel a bit 
> inconvenience of this usage



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49597) Support non-column arguments in UDTF for simpler usage

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49597:
--

Assignee: Apache Spark

> Support non-column arguments in UDTF for simpler usage
> --
>
> Key: SPARK-49597
> URL: https://issues.apache.org/jira/browse/SPARK-49597
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Currently UDTF only can accept column argument but users might feel a bit 
> inconvenience of this usage



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49609) Add API compatibility check between Classic and Connect

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49609:
--

Assignee: Apache Spark

> Add API compatibility check between Classic and Connect
> ---
>
> Key: SPARK-49609
> URL: https://issues.apache.org/jira/browse/SPARK-49609
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> We should ensure every API has same signature between Classic and Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49606) Improve documentation of Pandas on Spark plotting API

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49606:
--

Assignee: Apache Spark

> Improve documentation of Pandas on Spark plotting API
> -
>
> Key: SPARK-49606
> URL: https://issues.apache.org/jira/browse/SPARK-49606
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PS
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Improve documentation of Pandas on Spark plotting API following pandas 2.2 
> (stable), see https://pandas.pydata.org/docs/reference/frame.html.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49609) Add API compatibility check between Classic and Connect

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49609:
--

Assignee: (was: Apache Spark)

> Add API compatibility check between Classic and Connect
> ---
>
> Key: SPARK-49609
> URL: https://issues.apache.org/jira/browse/SPARK-49609
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> We should ensure every API has same signature between Classic and Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49606) Improve documentation of Pandas on Spark plotting API

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49606:
--

Assignee: (was: Apache Spark)

> Improve documentation of Pandas on Spark plotting API
> -
>
> Key: SPARK-49606
> URL: https://issues.apache.org/jira/browse/SPARK-49606
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PS
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Improve documentation of Pandas on Spark plotting API following pandas 2.2 
> (stable), see https://pandas.pydata.org/docs/reference/frame.html.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49609) Add API compatibility check between Classic and Connect

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49609:
---
Labels: pull-request-available  (was: )

> Add API compatibility check between Classic and Connect
> ---
>
> Key: SPARK-49609
> URL: https://issues.apache.org/jira/browse/SPARK-49609
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> We should ensure every API has same signature between Classic and Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49244) [M1] Further exception improvements

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49244:
--

Assignee: Apache Spark

> [M1] Further exception improvements
> ---
>
> Key: SPARK-49244
> URL: https://issues.apache.org/jira/browse/SPARK-49244
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dusan Tisma
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> We need to remove line numbers manually added to exceptions. Currently some 
> exceptions print the line number twice.
> Label exceptions need to use backquotes, same as with variables, i.e. need to 
> check if toSQLId, toSQLStmt, and similar methods are applied to all 
> identifiers.
> Maybe add some tests for \{LINE} numbers?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49244) [M1] Further exception improvements

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49244:
--

Assignee: (was: Apache Spark)

> [M1] Further exception improvements
> ---
>
> Key: SPARK-49244
> URL: https://issues.apache.org/jira/browse/SPARK-49244
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dusan Tisma
>Priority: Major
>  Labels: pull-request-available
>
> We need to remove line numbers manually added to exceptions. Currently some 
> exceptions print the line number twice.
> Label exceptions need to use backquotes, same as with variables, i.e. need to 
> check if toSQLId, toSQLStmt, and similar methods are applied to all 
> identifiers.
> Maybe add some tests for \{LINE} numbers?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49162) Push down date_trunc function

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49162:
--

Assignee: Apache Spark

> Push down date_trunc function
> -
>
> Key: SPARK-49162
> URL: https://issues.apache.org/jira/browse/SPARK-49162
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.2
>Reporter: Ivan Kukrkic
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>
> Postgres function date_trunc should be pushed down.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49244) [M1] Further exception improvements

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49244:
--

Assignee: (was: Apache Spark)

> [M1] Further exception improvements
> ---
>
> Key: SPARK-49244
> URL: https://issues.apache.org/jira/browse/SPARK-49244
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dusan Tisma
>Priority: Major
>  Labels: pull-request-available
>
> We need to remove line numbers manually added to exceptions. Currently some 
> exceptions print the line number twice.
> Label exceptions need to use backquotes, same as with variables, i.e. need to 
> check if toSQLId, toSQLStmt, and similar methods are applied to all 
> identifiers.
> Maybe add some tests for \{LINE} numbers?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49244) [M1] Further exception improvements

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49244:
--

Assignee: Apache Spark

> [M1] Further exception improvements
> ---
>
> Key: SPARK-49244
> URL: https://issues.apache.org/jira/browse/SPARK-49244
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dusan Tisma
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> We need to remove line numbers manually added to exceptions. Currently some 
> exceptions print the line number twice.
> Label exceptions need to use backquotes, same as with variables, i.e. need to 
> check if toSQLId, toSQLStmt, and similar methods are applied to all 
> identifiers.
> Maybe add some tests for \{LINE} numbers?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49162) Push down date_trunc function

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49162:
--

Assignee: Apache Spark

> Push down date_trunc function
> -
>
> Key: SPARK-49162
> URL: https://issues.apache.org/jira/browse/SPARK-49162
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.2
>Reporter: Ivan Kukrkic
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>
> Postgres function date_trunc should be pushed down.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49606) Improve documentation of Pandas on Spark plotting API

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49606:
---
Labels: pull-request-available  (was: )

> Improve documentation of Pandas on Spark plotting API
> -
>
> Key: SPARK-49606
> URL: https://issues.apache.org/jira/browse/SPARK-49606
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PS
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Improve documentation of Pandas on Spark plotting API following pandas 2.2 
> (stable), see https://pandas.pydata.org/docs/reference/frame.html.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49605) Fix the prompt when `ascendingOrder` is `DataTypeMismatch` in `SortArray`

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49605:
---
Labels: pull-request-available  (was: )

> Fix the prompt when `ascendingOrder` is `DataTypeMismatch` in `SortArray`
> -
>
> Key: SPARK-49605
> URL: https://issues.apache.org/jira/browse/SPARK-49605
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49594) Add check on whether columnFamilies were added or removed to write StateSchemaV3 file

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49594:
---
Labels: pull-request-available  (was: )

> Add check on whether columnFamilies were added or removed to write 
> StateSchemaV3 file
> -
>
> Key: SPARK-49594
> URL: https://issues.apache.org/jira/browse/SPARK-49594
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Eric Marnadi
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49602) Fix `assembly/pom.xml` to use `{project.version}` instead of `{version}`

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49602:
---
Labels: pull-request-available  (was: )

> Fix `assembly/pom.xml` to use `{project.version}` instead of `{version}`
> 
>
> Key: SPARK-49602
> URL: https://issues.apache.org/jira/browse/SPARK-49602
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49262) Implement full trim sensitivity support

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49262:
---
Labels: pull-request-available  (was: )

> Implement full trim sensitivity support
> ---
>
> Key: SPARK-49262
> URL: https://issues.apache.org/jira/browse/SPARK-49262
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49600) Remove `Python 3.6 and older`-related logic from `try_simplify_traceback`

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49600:
---
Labels: pull-request-available  (was: )

> Remove `Python 3.6 and older`-related logic  from `try_simplify_traceback`
> --
>
> Key: SPARK-49600
> URL: https://issues.apache.org/jira/browse/SPARK-49600
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49386) Add memory based thresholds for shuffle spill

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49386:
---
Labels: pull-request-available  (was: )

> Add memory based thresholds for shuffle spill
> -
>
> Key: SPARK-49386
> URL: https://issues.apache.org/jira/browse/SPARK-49386
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: dzcxzl
>Priority: Major
>  Labels: pull-request-available
>
> We can only determine the number of spills by configuring 
> {{{}spark.shuffle.spill.numElementsForceSpillThreshold{}}}. In some 
> scenarios, the size of a row may be very large in the memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49599) Upgrade snappy-java to 1.1.10.7

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49599:
---
Labels: pull-request-available  (was: )

> Upgrade snappy-java to 1.1.10.7
> ---
>
> Key: SPARK-49599
> URL: https://issues.apache.org/jira/browse/SPARK-49599
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49598) Support to add custom user defined labels on OnDemand PVCs

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49598:
---
Labels: pull-request-available  (was: )

> Support to add custom user defined labels on OnDemand PVCs
> --
>
> Key: SPARK-49598
> URL: https://issues.apache.org/jira/browse/SPARK-49598
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Prathit Malik
>Priority: Minor
>  Labels: pull-request-available
>
> Currently when user sets 
> volumes.persistentVolumeClaim.[VolumeName].options.claimName=OnDemand
> PVCs are created with only 1 label i.e. spark-app-selector = spark.app.id.
> Objective of this Jira is to allow support of custom labels for ondemand PVCs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49597) Support non-column arguments in UDTF for simpler usage

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49597:
---
Labels: pull-request-available  (was: )

> Support non-column arguments in UDTF for simpler usage
> --
>
> Key: SPARK-49597
> URL: https://issues.apache.org/jira/browse/SPARK-49597
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Currently UDTF only can accept column argument but users might feel a bit 
> inconvenience of this usage



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48355) [M1] Support for CASE statement

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48355:
--

Assignee: (was: Apache Spark)

> [M1] Support for CASE statement
> ---
>
> Key: SPARK-48355
> URL: https://issues.apache.org/jira/browse/SPARK-48355
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: David Milicevic
>Priority: Major
>  Labels: pull-request-available
>
> Details TBD.
>  
> For more details:
>  * Design doc in parent Jira item.
>  * [SQL ref 
> spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48355) [M1] Support for CASE statement

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48355:
--

Assignee: Apache Spark

> [M1] Support for CASE statement
> ---
>
> Key: SPARK-48355
> URL: https://issues.apache.org/jira/browse/SPARK-48355
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: David Milicevic
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Details TBD.
>  
> For more details:
>  * Design doc in parent Jira item.
>  * [SQL ref 
> spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48355) [M1] Support for CASE statement

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48355:
--

Assignee: Apache Spark

> [M1] Support for CASE statement
> ---
>
> Key: SPARK-48355
> URL: https://issues.apache.org/jira/browse/SPARK-48355
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: David Milicevic
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Details TBD.
>  
> For more details:
>  * Design doc in parent Jira item.
>  * [SQL ref 
> spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48355) [M1] Support for CASE statement

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48355:
--

Assignee: (was: Apache Spark)

> [M1] Support for CASE statement
> ---
>
> Key: SPARK-48355
> URL: https://issues.apache.org/jira/browse/SPARK-48355
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: David Milicevic
>Priority: Major
>  Labels: pull-request-available
>
> Details TBD.
>  
> For more details:
>  * Design doc in parent Jira item.
>  * [SQL ref 
> spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48355) [M1] Support for CASE statement

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48355:
--

Assignee: (was: Apache Spark)

> [M1] Support for CASE statement
> ---
>
> Key: SPARK-48355
> URL: https://issues.apache.org/jira/browse/SPARK-48355
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: David Milicevic
>Priority: Major
>  Labels: pull-request-available
>
> Details TBD.
>  
> For more details:
>  * Design doc in parent Jira item.
>  * [SQL ref 
> spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48355) [M1] Support for CASE statement

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48355:
--

Assignee: Apache Spark

> [M1] Support for CASE statement
> ---
>
> Key: SPARK-48355
> URL: https://issues.apache.org/jira/browse/SPARK-48355
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: David Milicevic
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Details TBD.
>  
> For more details:
>  * Design doc in parent Jira item.
>  * [SQL ref 
> spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49596) Improve the expression `FormatString`

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49596:
---
Labels: pull-request-available  (was: )

> Improve the expression `FormatString`
> -
>
> Key: SPARK-49596
> URL: https://issues.apache.org/jira/browse/SPARK-49596
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49162) Push down date_trunc function

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49162:
--

Assignee: (was: Apache Spark)

> Push down date_trunc function
> -
>
> Key: SPARK-49162
> URL: https://issues.apache.org/jira/browse/SPARK-49162
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.2
>Reporter: Ivan Kukrkic
>Priority: Minor
>  Labels: pull-request-available
>
> Postgres function date_trunc should be pushed down.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49585) Get rid of unnecessary executions list in SessionHolder

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49585:
---
Labels: pull-request-available  (was: )

> Get rid of unnecessary executions list in SessionHolder
> ---
>
> Key: SPARK-49585
> URL: https://issues.apache.org/jira/browse/SPARK-49585
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Changgyoo Park
>Priority: Minor
>  Labels: pull-request-available
>
> ExecutionManager.executions can fully substitute SessionHolder.executions.
> Adverse effect.
> - interrupt* will take longer if there are many sessions with many executions 
> -> SessionHolder manages a set of operation IDs instead of ExecuteHolders.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49595) Fix DataFrame.unpivot/melt in Spark Connect

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49595:
---
Labels: pull-request-available  (was: )

> Fix DataFrame.unpivot/melt in Spark Connect
> ---
>
> Key: SPARK-49595
> URL: https://issues.apache.org/jira/browse/SPARK-49595
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49569) Introduce Shim for missing spark/core classes

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49569:
---
Labels: pull-request-available  (was: )

> Introduce Shim for missing spark/core classes
> -
>
> Key: SPARK-49569
> URL: https://issues.apache.org/jira/browse/SPARK-49569
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SQL
>Affects Versions: 4.0.0
>Reporter: Herman van Hövell
>Priority: Major
>  Labels: pull-request-available
>
> Introduce shims for SparkContext, RDD, and QueryExecution. This will have to 
> be a in a separate module, and is supposed to be a compile time dependency 
> for sql/api project, and an actual dependency for an independent Spark 
> Connect Client.
> We need these three classes to support all user facing API in the sql/api 
> project. This will allow us to make the classes the primary interface for 
> Scala Dataset operations.
> For connect these methods will throw an (actionable) error, and for the 
> classic client they will just work. On the connect side in the future we can 
> use this to build better errors, and provide (method specific) mitigations. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49590) E2E test template includes invalid spec field

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49590:
---
Labels: pull-request-available  (was: )

> E2E test template includes invalid spec field
> -
>
> Key: SPARK-49590
> URL: https://issues.apache.org/jira/browse/SPARK-49590
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49574) ExpressionEncoder should track its AgnosticEncoder

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49574:
---
Labels: pull-request-available  (was: )

> ExpressionEncoder should track its AgnosticEncoder
> --
>
> Key: SPARK-49574
> URL: https://issues.apache.org/jira/browse/SPARK-49574
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SQL
>Affects Versions: 4.0.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49584) Upgrade log4j2 to 2.24.0

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49584:
--

Assignee: (was: Apache Spark)

> Upgrade log4j2 to 2.24.0
> 
>
> Key: SPARK-49584
> URL: https://issues.apache.org/jira/browse/SPARK-49584
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49548) Get rid of coarse-locking in SparkConnectSessionManager

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49548:
--

Assignee: Apache Spark

> Get rid of coarse-locking in SparkConnectSessionManager
> ---
>
> Key: SPARK-49548
> URL: https://issues.apache.org/jira/browse/SPARK-49548
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Changgyoo Park
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>
> Related to https://issues.apache.org/jira/browse/SPARK-49544.
> -> This has never caused a real world problem, but we had better fix it in 
> tandem with https://issues.apache.org/jira/browse/SPARK-49544.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49584) Upgrade log4j2 to 2.24.0

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49584:
--

Assignee: (was: Apache Spark)

> Upgrade log4j2 to 2.24.0
> 
>
> Key: SPARK-49584
> URL: https://issues.apache.org/jira/browse/SPARK-49584
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49398) Cache Table with Parameter markers returns wrong error

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49398:
--

Assignee: Apache Spark

> Cache Table with Parameter markers returns wrong error
> --
>
> Key: SPARK-49398
> URL: https://issues.apache.org/jira/browse/SPARK-49398
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available, starter
>
> When investigating the OSS structure of code it was found that 
> `CacheTableAsSelect` when used with parameter markers in the select part of 
> the query fails with `UNBOUND_SQL_PARAMETER` even though logically it should 
> fail with `UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT`. The 
> reason for the second error is that `CacheTableAsSelect` creates a temporary 
> view which should follow rules for parameter markers as views does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49584) Upgrade log4j2 to 2.24.0

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49584:
---
Labels: pull-request-available  (was: )

> Upgrade log4j2 to 2.24.0
> 
>
> Key: SPARK-49584
> URL: https://issues.apache.org/jira/browse/SPARK-49584
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49398) Cache Table with Parameter markers returns wrong error

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49398:
--

Assignee: (was: Apache Spark)

> Cache Table with Parameter markers returns wrong error
> --
>
> Key: SPARK-49398
> URL: https://issues.apache.org/jira/browse/SPARK-49398
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Minor
>  Labels: pull-request-available, starter
>
> When investigating the OSS structure of code it was found that 
> `CacheTableAsSelect` when used with parameter markers in the select part of 
> the query fails with `UNBOUND_SQL_PARAMETER` even though logically it should 
> fail with `UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT`. The 
> reason for the second error is that `CacheTableAsSelect` creates a temporary 
> view which should follow rules for parameter markers as views does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49398) Cache Table with Parameter markers returns wrong error

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49398:
--

Assignee: Apache Spark

> Cache Table with Parameter markers returns wrong error
> --
>
> Key: SPARK-49398
> URL: https://issues.apache.org/jira/browse/SPARK-49398
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available, starter
>
> When investigating the OSS structure of code it was found that 
> `CacheTableAsSelect` when used with parameter markers in the select part of 
> the query fails with `UNBOUND_SQL_PARAMETER` even though logically it should 
> fail with `UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT`. The 
> reason for the second error is that `CacheTableAsSelect` creates a temporary 
> view which should follow rules for parameter markers as views does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49582) Fix "dispatch_window_method" utility and documentation

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49582:
---
Labels: pull-request-available  (was: )

> Fix "dispatch_window_method" utility and documentation
> --
>
> Key: SPARK-49582
> URL: https://issues.apache.org/jira/browse/SPARK-49582
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Fix "dispatch_window_method" utility and documentation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49578) Change error message for CAST_INVALID_INPUT and CAST_OVERFLOW

2024-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49578:
---
Labels: pull-request-available  (was: )

> Change error message for CAST_INVALID_INPUT and CAST_OVERFLOW
> -
>
> Key: SPARK-49578
> URL: https://issues.apache.org/jira/browse/SPARK-49578
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>
> CAST_INVALID_INPUT and CAST_OVERFLOW both contain suggested fixes for turning 
> off ANSI mode. Now that in Spark 4.0.0 we have moved to ANSI mode on by 
> default, we want to keep suggestions of this kind to the minimal. There 
> exists implementation of `try_cast` which provides casting as for ANSI mode 
> off and that suggestion should be sufficient for users to move forward.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49579) Rename errorClass in checkError()

2024-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49579:
---
Labels: pull-request-available  (was: )

> Rename errorClass in checkError()
> -
>
> Key: SPARK-49579
> URL: https://issues.apache.org/jira/browse/SPARK-49579
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> Rename errorClass to condition in checkError() and related functions to 
> follow up the agreement of https://issues.apache.org/jira/browse/SPARK-46810



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49576) Upload Python logs in CI

2024-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49576:
---
Labels: pull-request-available  (was: )

> Upload Python logs in CI
> 
>
> Key: SPARK-49576
> URL: https://issues.apache.org/jira/browse/SPARK-49576
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> e.g., 
> /__w/spark/spark/python/target/28a23950-46c7-45c5-a9b7-42e7d9b21518/python3.12__pyspark.sql.tests.connect.test_connect_session__ah_ug0xu.log)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49567) Use `classic` instead of `vanilla` from PySpark code base

2024-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49567:
---
Labels: pull-request-available  (was: )

> Use `classic` instead of `vanilla` from PySpark code base
> -
>
> Key: SPARK-49567
> URL: https://issues.apache.org/jira/browse/SPARK-49567
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> We decided to use classic for legacy PySpark



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49544) Severe lock contention in SparkConnectExecutionManager

2024-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49544:
---
Labels: pull-request-available  (was: )

> Severe lock contention in SparkConnectExecutionManager
> --
>
> Key: SPARK-49544
> URL: https://issues.apache.org/jira/browse/SPARK-49544
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Changgyoo Park
>Priority: Major
>  Labels: pull-request-available
>
> Critical sections protected by executionsLock can become too broad when there 
> are too many ExecuteHolders, e.g., >= 10^4. The problem is aggravated when 
> there are too many threads in the system: priority inversion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49545) Increase timeout for build from 3 to 4 hours

2024-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49545:
---
Labels: pull-request-available  (was: )

> Increase timeout for build from 3 to 4 hours
> 
>
> Key: SPARK-49545
> URL: https://issues.apache.org/jira/browse/SPARK-49545
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/workflows/build_python_3.12.yml fails 
> with hitting 3 hours. We should increase it up to 4 hours.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49501) Catalog createTable API is double-escaping paths

2024-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49501:
--

Assignee: (was: Apache Spark)

> Catalog createTable API is double-escaping paths
> 
>
> Key: SPARK-49501
> URL: https://issues.apache.org/jira/browse/SPARK-49501
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Christos Stavrakakis
>Priority: Major
>  Labels: pull-request-available
>
> Creating an external table using {{spark.catalog.createTable}} results in 
> incorrect escaping of special chars in paths.
> Consider the following code:
> {}spark.catalog.createTable({}}}{{{}"testTable", source = "parquet", 
> schema = new StructType().add("id", "int"), description = "", options = 
> Map("path" -> "/tmp/test table")){}
> The above call creates a table that is stored in {{/tmp/test%20table}} 
> instead of {{{}/tmp/test table{}}}. Note that this behaviour is different 
> from the SQL API, e.g. {{create table testTable(id int) using parquet 
> location '/tmp/test table'}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49506) Optimize ArrayBinarySearch for foldable array

2024-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49506:
--

Assignee: BingKun Pan  (was: Apache Spark)

> Optimize ArrayBinarySearch for foldable array
> -
>
> Key: SPARK-49506
> URL: https://issues.apache.org/jira/browse/SPARK-49506
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49540) Unify the usage of `distributed_sequence_id`

2024-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49540:
---
Labels: pull-request-available  (was: )

> Unify the usage of `distributed_sequence_id`
> 
>
> Key: SPARK-49540
> URL: https://issues.apache.org/jira/browse/SPARK-49540
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49538) Detect unused message parameters

2024-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49538:
---
Labels: pull-request-available  (was: )

> Detect unused message parameters
> 
>
> Key: SPARK-49538
> URL: https://issues.apache.org/jira/browse/SPARK-49538
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Minor
>  Labels: pull-request-available
>
> The passed error message parameters and places holders in message format can 
> not be matched. From the code maintainability perspective, it would be nice 
> to detect such cases while running tests.
> For example, the error message format could look like:
> {code}
>   "CANNOT_UP_CAST_DATATYPE" : {
> "message" : [
>   "Cannot up cast  from  to .",
>   ""
> ],
> "sqlState" : "42846"
>   },
> {code}
>  
> but the passed message parameters have extra parameter:
> {code:scala}
>   messageParameters = Map(
> "expression" -> "CAST('aaa' AS LONG)",
> "sourceType" -> "STRING",
> "targetType" -> "LONG",
> "op" -> "CAST", // unused parameter
> "details" -> "implicit cast"
>   ))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49537) Incorrect Join stats estimate

2024-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49537:
---
Labels: pull-request-available  (was: )

> Incorrect Join stats estimate
> -
>
> Key: SPARK-49537
> URL: https://issues.apache.org/jira/browse/SPARK-49537
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: diable CBO.png, enable CBO.png
>
>
> Error message:
> {noformat}
> org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.SparkException: Cannot broadcast the table that is larger 
> than 4GB: 4GB
> at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:45)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:340)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:198)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> {noformat}
> Left side stats:
> {noformat}
> 36126 bytes, 2150 rows
> {noformat}
> |info_name|info_value|
> |col_name|brand|
> |data_type|string|
> |comment|NULL|
> |min|NULL|
> |max|NULL|
> |num_nulls|1|
> |distinct_count|1980|
> |avg_col_len|9|
> |max_col_len|38|
> |histogram|NULL|
> Right side stats:
> {noformat}
> 13250653950 bytes, 1470064309 rows
> {noformat}
> |info_name|info_value|
> |col_name|brand|
> |data_type|string|
> |comment|NULL|
> |min|NULL|
> |max|NULL|
> |num_nulls|320713790|
> |distinct_count|3896196|
> |avg_col_len|8|
> |max_col_len|69|
> |histogram|NULL|
> Join plan:
> {noformat}
> == Optimized Logical Plan ==
> Project [brand#612428, leaf_categ_name#612429, leaf_categ_id#612430, 
> GMV_LC_AMT#615773, item_price#615665], Statistics(sizeInBytes=2.41E+25 B)
> +- Join Inner, ((item_id#615802 = item_id#612432) AND (leaf_categ_id#615805 = 
> leaf_categ_id#612430)), Statistics(sizeInBytes=3.07E+25 B)
>:- Project [brand#612428, leaf_categ_name#612429, leaf_categ_id#612430, 
> item_id#612432], Statistics(sizeInBytes=55.7 MiB, rowCount=8.11E+5)
>:  +- Join Inner, (brand#612434 = brand#612428), 
> Statistics(sizeInBytes=71.1 MiB, rowCount=8.11E+5)
>: :- Project [brand#612428, leaf_categ_name#612429, 
> leaf_categ_id#612430], Statistics(sizeInBytes=136.4 KiB, rowCount=2.15E+3)
>: :  +- Filter (isnotnull(leaf_categ_id#612430) AND 
> isnotnull(brand#612428)), Statistics(sizeInBytes=170.0 KiB, rowCount=2.15E+3)
>: : +- Relation 
> spark_catalog.tableA[brand#612428,leaf_categ_name#612429,leaf_categ_id#612430,dom_gmv#612431]
>  parquet, Statistics(sizeInBytes=170.1 KiB, rowCount=2.15E+3)
>: +- Project [item_id#612432, brand#612434], 
> Statistics(sizeInBytes=38.5 GiB, rowCount=1.15E+9)
>:+- Filter (isnotnull(item_id#612432) AND 
> isnotnull(brand#612434)), Statistics(sizeInBytes=42.8 GiB, rowCount=1.15E+9)
>:   +- Relation 
> spark_catalog.tableB[item_id#612432,auct_end_dt#612433,brand#612434] parquet, 
> Statistics(sizeInBytes=54.8 GiB, rowCount=1.47E+9)
>+- Project [item_id#615802, leaf_categ_id#615805, CASE WHEN 
> tax_state#615824 IN (UK,EU) THEN cast(broundcast(quantity#615828 as 
> decimal(10,0)) * item_price#615827) + item_sales_tax_amt#615887) / 
> cast(quantity#615828 as decimal(10,0))), 2) as decimal(38,2)) ELSE 
> cast(item_price#615827 as decimal(38,2)) END AS item_price#615665, 
> coalesce(GMV_LC_AMT#615933, 0.00) AS gmv_lc_amt#615773], 
> Statistics(sizeInBytes=466.4 PiB)
>   +- Join LeftOuter, (cast(byr_curncy_id#615921 as decimal(9,0)) = 
> curncy_id#615796), Statistics(sizeInBytes=799.5 PiB)
>  :- Project [item_id#615802, leaf_categ_id#615805, tax_state#615824, 
> item_price#615827, quantity#615828, item_sales_tax_amt#615887, 
> byr_curncy_id#615921, GMV_LC_AMT#615933], Statistics(sizeInBytes=756.7 TiB)
>  :  +- Join LeftOuter, (cast(lstg_curncy_id#615848 as decimal(9,0)) = 
> curncy_id#612267), Statistics(sizeInBytes=894.2 TiB)
>  : :- Project [item_id#615802, leaf_categ_id#615805, 
> tax_state#615824, item_price#615827, quantity#615828, lstg_curncy_id#615848, 
> item_sales_tax_amt#615887, byr_curncy_id#615921, GMV_LC_AMT#615933], 
> Statistics(sizeInBytes=846.3 GiB)
>  : :  +- Filter ((isnotnull(GMV_DT#615926) AND 
> isnotnull(seller_id#615806)) AND (GMV_DT#615926 >= 2023-09-01)) AND 
> (GMV_DT#615926 <= 2024-08-31)) AND isnotnull(item_id#615802)) AND 
> isnotnull(leaf_categ_id#615805)) AND site_id#615804 IN (0,100)) AND NOT 
> checkou

[jira] [Updated] (SPARK-49536) Add error handling for python streaming data source record prefetching

2024-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49536:
---
Labels: pull-request-available  (was: )

> Add error handling for python streaming data source record prefetching
> --
>
> Key: SPARK-49536
> URL: https://issues.apache.org/jira/browse/SPARK-49536
> Project: Spark
>  Issue Type: Task
>  Components: PySpark, SS
>Affects Versions: 4.0.0
>Reporter: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
>
> Currently there is an assert that return status code from the python worker 
> is SpecialLengths.START_ARROW_STREAM when python source runner is prefetching 
> records. To improve debugability, check the status code and rethrow an 
> runtime error with detailed error message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48376) [M1] Support for ITERATE statement

2024-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48376:
---
Labels: pull-request-available  (was: )

> [M1] Support for ITERATE statement
> --
>
> Key: SPARK-48376
> URL: https://issues.apache.org/jira/browse/SPARK-48376
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: David Milicevic
>Assignee: David Milicevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add support for ITERATE statement in WHILE (and other) loops to SQL scripting 
> parser & interpreter.
> This is the same functionality as CONTINUE in other languages.
>  
> For more details:
>  * Design doc in parent Jira item.
>  * [SQL ref 
> spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49424) Consolidate Encoders in sql/api

2024-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49424:
---
Labels: pull-request-available  (was: )

> Consolidate Encoders in sql/api
> ---
>
> Key: SPARK-49424
> URL: https://issues.apache.org/jira/browse/SPARK-49424
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SQL
>Affects Versions: 4.0.0
>Reporter: Herman van Hövell
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49534) `sql/hive` should not be prepended when `spark-hive_xxx.jar` is not in the classpath

2024-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49534:
--

Assignee: (was: Apache Spark)

> `sql/hive` should not be prepended when `spark-hive_xxx.jar` is not in the 
> classpath
> 
>
> Key: SPARK-49534
> URL: https://issues.apache.org/jira/browse/SPARK-49534
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49534) `sql/hive` should not be prepended when `spark-hive_xxx.jar` is not in the classpath

2024-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49534:
--

Assignee: Apache Spark

> `sql/hive` should not be prepended when `spark-hive_xxx.jar` is not in the 
> classpath
> 
>
> Key: SPARK-49534
> URL: https://issues.apache.org/jira/browse/SPARK-49534
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49534) `sql/hive` should not be prepended when `spark-hive_xxx.jar` is not in the classpath

2024-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49534:
---
Labels: pull-request-available  (was: )

> `sql/hive` should not be prepended when `spark-hive_xxx.jar` is not in the 
> classpath
> 
>
> Key: SPARK-49534
> URL: https://issues.apache.org/jira/browse/SPARK-49534
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49527) Generate Spark Operator Config Property Doc

2024-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49527:
---
Labels: pull-request-available  (was: )

> Generate Spark Operator Config Property Doc
> ---
>
> Key: SPARK-49527
> URL: https://issues.apache.org/jira/browse/SPARK-49527
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49505) Create SQL functions to generate random strings or numbers within ranges

2024-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49505:
---
Labels: pull-request-available  (was: )

> Create SQL functions to generate random strings or numbers within ranges
> 
>
> Key: SPARK-49505
> URL: https://issues.apache.org/jira/browse/SPARK-49505
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49526) Windows-style paths are unsupported in ArtifactManager

2024-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49526:
---
Labels: pull-request-available  (was: )

> Windows-style paths are unsupported in ArtifactManager
> --
>
> Key: SPARK-49526
> URL: https://issues.apache.org/jira/browse/SPARK-49526
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>  Labels: pull-request-available
>
> Currently, windows-based clients will run into an issue when using the 
> `addArtifact` API as the path passed to the server would contain backslashes 
> which the server would interpret as part of the file name rather than a 
> separator. 
> E.g if the client sends the name `pyfiles\abc.txt` to the server, then the 
> artifact would be written out as `/pyfiles\abc.txt` 
> instead of the correct `\pyfiles\abc.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49525) Log improvement for server side streaming query listener bus listener

2024-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49525:
---
Labels: pull-request-available  (was: )

> Log improvement for server side streaming query listener bus listener
> -
>
> Key: SPARK-49525
> URL: https://issues.apache.org/jira/browse/SPARK-49525
> Project: Spark
>  Issue Type: Task
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49518) Use build-helper-maven-plugin to manage the code for volcano

2024-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49518:
--

Assignee: Apache Spark

> Use build-helper-maven-plugin to manage the code for volcano
> 
>
> Key: SPARK-49518
> URL: https://issues.apache.org/jira/browse/SPARK-49518
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49507) Fix Expected only partition pruning predicates exception

2024-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49507:
---
Labels: pull-request-available  (was: )

> Fix Expected only partition pruning predicates exception
> 
>
> Key: SPARK-49507
> URL: https://issues.apache.org/jira/browse/SPARK-49507
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0, 4.0.0, 3.5.3
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>
> How to reproduce:
> {code:scala}
> sql("CREATE TABLE t (ID BIGINT, DT STRING) USING parquet PARTITIONED BY (DT)")
> sql("set spark.sql.hive.metastorePartitionPruningFastFallback=true")
> sql("select * from t where dt=20240820").show
> {code}
> {noformat}
> org.apache.spark.sql.AnalysisException: Expected only partition pruning 
> predicates: List(isnotnull(DT#21), (cast(DT#21 as bigint) = 20240820)).
>   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.nonPartitionPruningPredicatesNotExpectedError(QueryCompilationErrors.scala:2414)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.generatePartitionPredicateByFilter(ExternalCatalo
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49519) Combine options of table and relation when create CSVScanBuilder

2024-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49519:
---
Labels: pull-request-available  (was: )

> Combine options of table and relation when create CSVScanBuilder
> 
>
> Key: SPARK-49519
> URL: https://issues.apache.org/jira/browse/SPARK-49519
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiayi Liu
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the {{CSVTable}} only uses the options from {{relation}} when 
> constructing the {{{}CSVScanBuilder{}}}, which leads to the omission of the 
> contents in {{{}CSVTable.options{}}}. For the {{{}TableCatalog{}}}, the 
> {{dsOptions}} can be set into the {{CSVTable.options}} returned by the 
> {{TableCatalog.loadTable}} method. If only the relation {{options}} are used 
> here, the {{TableCatalog}} will not be able to pass {{dsOptions}} that 
> contains CSV options to {{{}CSVScan{}}}.
> Combining the two options is a better option.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49516) Upgrade the minimum K8s version to v1.28

2024-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49516:
---
Labels: pull-request-available  (was: )

> Upgrade the minimum K8s version to v1.28
> 
>
> Key: SPARK-49516
> URL: https://issues.apache.org/jira/browse/SPARK-49516
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Project Infra
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49509) Use Platform.allocateDirectBuffer instead of ByteBuffer.allocateDirect

2024-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49509:
--

Assignee: Apache Spark

> Use Platform.allocateDirectBuffer instead of ByteBuffer.allocateDirect
> --
>
> Key: SPARK-49509
> URL: https://issues.apache.org/jira/browse/SPARK-49509
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49509) Use Platform.allocateDirectBuffer instead of ByteBuffer.allocateDirect

2024-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49509:
---
Labels: pull-request-available  (was: )

> Use Platform.allocateDirectBuffer instead of ByteBuffer.allocateDirect
> --
>
> Key: SPARK-49509
> URL: https://issues.apache.org/jira/browse/SPARK-49509
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49506) Optimize ArrayBinarySearch for foldable array

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49506:
---
Labels: pull-request-available  (was: )

> Optimize ArrayBinarySearch for foldable array
> -
>
> Key: SPARK-49506
> URL: https://issues.apache.org/jira/browse/SPARK-49506
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49307) Support Kryo Serialization with AgnosticEncoders

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49307:
---
Labels: pull-request-available  (was: )

> Support Kryo Serialization with AgnosticEncoders
> 
>
> Key: SPARK-49307
> URL: https://issues.apache.org/jira/browse/SPARK-49307
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Herman van Hövell
>Priority: Major
>  Labels: pull-request-available
>
> Add support for Kryo based serialization to Agnostic encoders. This will 
> allow us to port the entire Encoders class from sql/core to sql/api.
> Unfortunately supporting connect is not really possible at this moment. We 
> cannot share the configuration of the Kryo objects between the server and 
> connect. This is not possible due to the fact that connect - by design - does 
> not have a all classes needed on its classpath. This makes constructing the 
> same configuration (with the same class ids) almost impossible. On top of 
> this, backwards compatibility will be a problem.
> For connect the only way forward is to have a separately configured version 
> of the kryo serializer that leverages hard coded class ids. That is probably 
> going to need some form of configurability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48965) toJSON produces wrong values if DecimalType information is lost in as[Product]

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48965:
---
Labels: correctness pull-request-available  (was: correctness)

> toJSON produces wrong values if DecimalType information is lost in as[Product]
> --
>
> Key: SPARK-48965
> URL: https://issues.apache.org/jira/browse/SPARK-48965
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.5.1
>Reporter: Dmitry Lapshin
>Priority: Major
>  Labels: correctness, pull-request-available
>
> Consider this example:
> {code:scala}
> package com.jetbrains.jetstat.etl
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.types.DecimalType
> object A {
>   case class Example(x: BigDecimal)
>   def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder()
>   .master("local[1]")
>   .getOrCreate()
> import spark.implicits._
> val originalRaw = BigDecimal("123.456")
> val original = Example(originalRaw)
> val ds1 = spark.createDataset(Seq(original))
> val ds2 = ds1
>   .withColumn("x", $"x" cast DecimalType(12, 6))
> val ds3 = ds2
>   .as[Example]
> println(s"DS1: schema=${ds1.schema}, 
> encoder.schema=${ds1.encoder.schema}")
> println(s"DS2: schema=${ds1.schema}, 
> encoder.schema=${ds2.encoder.schema}")
> println(s"DS3: schema=${ds1.schema}, 
> encoder.schema=${ds3.encoder.schema}")
> val json1 = ds1.toJSON.collect().head
> val json2 = ds2.toJSON.collect().head
> val json3 = ds3.toJSON.collect().head
> val collect1 = ds1.collect().head
> val collect2_ = ds2.collect().head
> val collect2 = collect2_.getDecimal(collect2_.fieldIndex("x"))
> val collect3 = ds3.collect().head
> println(s"Original: $original (scale = ${original.x.scale}, precision = 
> ${original.x.precision})")
> println(s"Collect1: $collect1 (scale = ${collect1.x.scale}, precision = 
> ${collect1.x.precision})")
> println(s"Collect2: $collect2 (scale = ${collect2.scale}, precision = 
> ${collect2.precision})")
> println(s"Collect3: $collect3 (scale = ${collect3.x.scale}, precision = 
> ${collect3.x.precision})")
> println(s"json1: $json1")
> println(s"json2: $json2")
> println(s"json3: $json3")
>   }
> }
> {code}
> Running it you'd see that json3 contains very much wrong data. After a bit of 
> debugging, and sorry since I'm bad with Spark internals, I've found that:
>  * In-memory representation of the data in this example used {{UnsafeRow}}, 
> whose {{.getDecimal}} uses compression to store small Decimal values as 
> longs, but doesn't remember decimal sizing parameters,
>  * However, there are at least two sources for precision & scale to pass to 
> that method: {{Dataset.schema}} (which is based on query execution, always 
> contains 38,18 for me) and {{Dataset.encoder.schema}} (that gets updated in 
> `ds2` to 12,6 but then is reset in `ds3`). Also, there is a 
> {{Dataset.deserializer}} that seems to be combining those two non-trivially.
>  * This doesn't seem to affect {{Dataset.collect()}} methods since they use 
> {{deserializer}}, but {{Dataset.toJSON}} only uses the first schema.
> Seems to me that either {{.toJSON}} should be more aware of what's going on 
> or {{.as[]}} should be doing something else.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49413) Create shared RuntimeConf interface

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49413:
---
Labels: pull-request-available  (was: )

> Create shared RuntimeConf interface
> ---
>
> Key: SPARK-49413
> URL: https://issues.apache.org/jira/browse/SPARK-49413
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SQL
>Affects Versions: 4.0.0
>Reporter: Herman van Hövell
>Priority: Major
>  Labels: pull-request-available
>
> Create a shared RuntimeConf interface in org.apache.spark.sql that is shared 
> between Classic and Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49504) Add `jjwt` profile

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49504:
---
Labels: pull-request-available  (was: )

> Add `jjwt` profile
> --
>
> Key: SPARK-49504
> URL: https://issues.apache.org/jira/browse/SPARK-49504
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> This issue aims to add a new profile `jjwt` to provide `jjwt-impl` and 
> `jjwt-jackson` jars files in a Spark distribution



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49467) Add support for state data source reader and list state

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49467:
---
Labels: pull-request-available  (was: )

> Add support for state data source reader and list state
> ---
>
> Key: SPARK-49467
> URL: https://issues.apache.org/jira/browse/SPARK-49467
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Anish Shrigondekar
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27085) Migrate CSV to File Data Source V2

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-27085:
---
Labels: pull-request-available  (was: )

> Migrate CSV to File Data Source V2
> --
>
> Key: SPARK-27085
> URL: https://issues.apache.org/jira/browse/SPARK-27085
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49502) Avoid NPE in SparkEnv.get.shuffleManager.unregisterShuffle

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49502:
---
Labels: pull-request-available  (was: )

> Avoid NPE in SparkEnv.get.shuffleManager.unregisterShuffle
> --
>
> Key: SPARK-49502
> URL: https://issues.apache.org/jira/browse/SPARK-49502
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49501) Catalog createTable API is double-escaping paths

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49501:
---
Labels: pull-request-available  (was: )

> Catalog createTable API is double-escaping paths
> 
>
> Key: SPARK-49501
> URL: https://issues.apache.org/jira/browse/SPARK-49501
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Christos Stavrakakis
>Priority: Major
>  Labels: pull-request-available
>
> Creating an external table using {{spark.catalog.createTable}} results in 
> incorrect escaping of special chars in paths.
> Consider the following code:
> {}spark.catalog.createTable({}}}{{{}"testTable", source = "parquet", 
> schema = new StructType().add("id", "int"), description = "", options = 
> Map("path" -> "/tmp/test table")){}
> The above call creates a table that is stored in {{/tmp/test%20table}} 
> instead of {{{}/tmp/test table{}}}. Note that this behaviour is different 
> from the SQL API, e.g. {{create table testTable(id int) using parquet 
> location '/tmp/test table'}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-49414) Create a shared DataFrameReader interface

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49414:
---
Labels: pull-request-available  (was: )

> Create a shared DataFrameReader interface
> -
>
> Key: SPARK-49414
> URL: https://issues.apache.org/jira/browse/SPARK-49414
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SQL
>Affects Versions: 4.0.0
>Reporter: Herman van Hövell
>Priority: Major
>  Labels: pull-request-available
>
> Create a shared DataFrameReader in org.apache.spark.sql.api that is shared 
> between Classic and Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48698) Support analyze column stats for tables with collated columns

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48698:
---
Labels: pull-request-available  (was: )

> Support analyze column stats for tables with collated columns
> -
>
> Key: SPARK-48698
> URL: https://issues.apache.org/jira/browse/SPARK-48698
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
>
> Following sequence fails:
> {code:java}
> > create table t(s string collate utf8_lcase) using parquet;
> > insert into t values ('A');
> > analyze table t compute statistics for all columns;
> [UNSUPPORTED_FEATURE.ANALYZE_UNSUPPORTED_COLUMN_TYPE] The feature is not 
> supported: The ANALYZE TABLE FOR COLUMNS command does not support the type 
> "STRING COLLATE UTF8_LCASE" of the column `s` in the table 
> `spark_catalog`.`default`.`t`. SQLSTATE: 0A000
>  {code}
> Users should be able to run ANALYZE commands on tables which have columns 
> with collated type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48348) [M0] Support for LEAVE statement

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48348:
---
Labels: pull-request-available  (was: )

> [M0] Support for LEAVE statement
> 
>
> Key: SPARK-48348
> URL: https://issues.apache.org/jira/browse/SPARK-48348
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: David Milicevic
>Priority: Major
>  Labels: pull-request-available
>
> Add support for LEAVE statement in WHILE (and other) loops to SQL scripting 
> parser & interpreter.
> This is the same functionality as BREAK in other languages.
>  
> For more details:
>  * Design doc in parent Jira item.
>  * [SQL ref 
> spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49202) Register `binary_search_for_buckets` in the Scala side

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49202:
--

Assignee: Ruifeng Zheng  (was: Apache Spark)

> Register `binary_search_for_buckets` in the Scala side
> --
>
> Key: SPARK-49202
> URL: https://issues.apache.org/jira/browse/SPARK-49202
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49202) Register `binary_search_for_buckets` in the Scala side

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49202:
--

Assignee: Ruifeng Zheng  (was: Apache Spark)

> Register `binary_search_for_buckets` in the Scala side
> --
>
> Key: SPARK-49202
> URL: https://issues.apache.org/jira/browse/SPARK-49202
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-49202) Register `binary_search_for_buckets` in the Scala side

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-49202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49202:
--

Assignee: Apache Spark  (was: Ruifeng Zheng)

> Register `binary_search_for_buckets` in the Scala side
> --
>
> Key: SPARK-49202
> URL: https://issues.apache.org/jira/browse/SPARK-49202
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    2   3   4   5   6   7   8   9   10   11   >