date:20230922

[jira] [Assigned] (SPARK-45287) Add Java 21 benchmark result

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45287:
-

Assignee: Dongjoon Hyun

> Add Java 21 benchmark result
> 
>
> Key: SPARK-45287
> URL: https://issues.apache.org/jira/browse/SPARK-45287
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45287) Add Java 21 benchmark result

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45287.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43065
[https://github.com/apache/spark/pull/43065]

> Add Java 21 benchmark result
> 
>
> Key: SPARK-45287
> URL: https://issues.apache.org/jira/browse/SPARK-45287
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44119) Drop K8s v1.25 and lower version support

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44119.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43069
[https://github.com/apache/spark/pull/43069]

> Drop K8s v1.25 and lower version support
> 
>
> Key: SPARK-44119
> URL: https://issues.apache.org/jira/browse/SPARK-44119
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> *1. Default K8s Version in Public Cloud environments*
> The default K8s versions of public cloud providers are already K8s 1.27+.
> - EKS: v1.27 (Default)
> - GKE: v1.27 (Stable), v1.27 (Regular), v1.27 (Rapid)
> *2. End Of Support*
> In addition, K8s 1.25 and olders are going to reach EOL when Apache Spark 
> 4.0.0 arrives on June 2024. K8s 1.26 is also going to reach EOL on June.
> || K8s  ||   AKS   ||   GKE   ||   EKS   ||
> | 1.27 | 2024-07 | 2024-08 | 2024-07 |
> | 1.26 | 2024-03 | 2024-06 | 2024-06 |
> | 1.25 | 2023-12 | 2024-02 | 2024-05 |
> | 1.24 | 2023-07 | 2023-10 | 2024-01 |
> - [AKS EOL 
> Schedule](https://docs.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#aks-kubernetes-release-calendar)
> - [GKE EOL 
> Schedule](https://cloud.google.com/kubernetes-engine/docs/release-schedule)
> - [EKS EOL 
> Schedule](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44119) Drop K8s v1.25 and lower version support

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44119:
-

Assignee: Dongjoon Hyun

> Drop K8s v1.25 and lower version support
> 
>
> Key: SPARK-44119
> URL: https://issues.apache.org/jira/browse/SPARK-44119
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> *1. Default K8s Version in Public Cloud environments*
> The default K8s versions of public cloud providers are already K8s 1.27+.
> - EKS: v1.27 (Default)
> - GKE: v1.27 (Stable), v1.27 (Regular), v1.27 (Rapid)
> *2. End Of Support*
> In addition, K8s 1.25 and olders are going to reach EOL when Apache Spark 
> 4.0.0 arrives on June 2024. K8s 1.26 is also going to reach EOL on June.
> || K8s  ||   AKS   ||   GKE   ||   EKS   ||
> | 1.27 | 2024-07 | 2024-08 | 2024-07 |
> | 1.26 | 2024-03 | 2024-06 | 2024-06 |
> | 1.25 | 2023-12 | 2024-02 | 2024-05 |
> | 1.24 | 2023-07 | 2023-10 | 2024-01 |
> - [AKS EOL 
> Schedule](https://docs.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#aks-kubernetes-release-calendar)
> - [GKE EOL 
> Schedule](https://cloud.google.com/kubernetes-engine/docs/release-schedule)
> - [EKS EOL 
> Schedule](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44118) Support K8s scheduling gates

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44118:
-

Assignee: (was: Dongjoon Hyun)

> Support K8s scheduling gates
> 
>
> Key: SPARK-44118
> URL: https://issues.apache.org/jira/browse/SPARK-44118
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/
> - Kubernetes v1.26 [alpha]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44118) Support K8s scheduling gates

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44118:
-

Assignee: Dongjoon Hyun

> Support K8s scheduling gates
> 
>
> Key: SPARK-44118
> URL: https://issues.apache.org/jira/browse/SPARK-44118
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/
> - Kubernetes v1.26 [alpha]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44119) Drop K8s v1.25 and lower version support

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44119:
--
Description: 
*1. Default K8s Version in Public Cloud environments*

The default K8s versions of public cloud providers are already K8s 1.27+.

- EKS: v1.27 (Default)
- GKE: v1.27 (Stable), v1.27 (Regular), v1.27 (Rapid)

*2. End Of Support*

In addition, K8s 1.25 and olders are going to reach EOL when Apache Spark 4.0.0 
arrives on June 2024. K8s 1.26 is also going to reach EOL on June.

|| K8s  ||   AKS   ||   GKE   ||   EKS   ||
| 1.27 | 2024-07 | 2024-08 | 2024-07 |
| 1.26 | 2024-03 | 2024-06 | 2024-06 |
| 1.25 | 2023-12 | 2024-02 | 2024-05 |
| 1.24 | 2023-07 | 2023-10 | 2024-01 |

- [AKS EOL 
Schedule](https://docs.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#aks-kubernetes-release-calendar)
- [GKE EOL 
Schedule](https://cloud.google.com/kubernetes-engine/docs/release-schedule)
- [EKS EOL 
Schedule](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar)

  was:EKS K8s v1.25 will reach the End-Of-Support on May 2024.


> Drop K8s v1.25 and lower version support
> 
>
> Key: SPARK-44119
> URL: https://issues.apache.org/jira/browse/SPARK-44119
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> *1. Default K8s Version in Public Cloud environments*
> The default K8s versions of public cloud providers are already K8s 1.27+.
> - EKS: v1.27 (Default)
> - GKE: v1.27 (Stable), v1.27 (Regular), v1.27 (Rapid)
> *2. End Of Support*
> In addition, K8s 1.25 and olders are going to reach EOL when Apache Spark 
> 4.0.0 arrives on June 2024. K8s 1.26 is also going to reach EOL on June.
> || K8s  ||   AKS   ||   GKE   ||   EKS   ||
> | 1.27 | 2024-07 | 2024-08 | 2024-07 |
> | 1.26 | 2024-03 | 2024-06 | 2024-06 |
> | 1.25 | 2023-12 | 2024-02 | 2024-05 |
> | 1.24 | 2023-07 | 2023-10 | 2024-01 |
> - [AKS EOL 
> Schedule](https://docs.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#aks-kubernetes-release-calendar)
> - [GKE EOL 
> Schedule](https://cloud.google.com/kubernetes-engine/docs/release-schedule)
> - [EKS EOL 
> Schedule](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45288) Remove outdated benchmark result files `jdk1[17]*results.txt`

2023-09-22 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45288.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43066
[https://github.com/apache/spark/pull/43066]

> Remove outdated benchmark result files `jdk1[17]*results.txt`
> -
>
> Key: SPARK-45288
> URL: https://issues.apache.org/jira/browse/SPARK-45288
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45288) Remove outdated benchmark result files `jdk1[17]*results.txt`

2023-09-22 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-45288:


Assignee: Dongjoon Hyun

> Remove outdated benchmark result files `jdk1[17]*results.txt`
> -
>
> Key: SPARK-45288
> URL: https://issues.apache.org/jira/browse/SPARK-45288
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44119) Drop K8s v1.25 and lower version support

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44119:
---
Labels: pull-request-available  (was: )

> Drop K8s v1.25 and lower version support
> 
>
> Key: SPARK-44119
> URL: https://issues.apache.org/jira/browse/SPARK-44119
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> EKS K8s v1.25 will reach the End-Of-Support on May 2024.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45274) Implementation of a new DAG drawing approach to avoid fork

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45274.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43053
[https://github.com/apache/spark/pull/43053]

> Implementation of a new DAG drawing approach to  avoid fork
> ---
>
> Key: SPARK-45274
> URL: https://issues.apache.org/jira/browse/SPARK-45274
> Project: Spark
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45274) Implementation of a new DAG drawing approach to avoid fork

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45274:
-

Assignee: Kent Yao

> Implementation of a new DAG drawing approach to  avoid fork
> ---
>
> Key: SPARK-45274
> URL: https://issues.apache.org/jira/browse/SPARK-45274
> Project: Spark
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44550) Wrong semantics for null IN (empty list)

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44550:
---
Labels: pull-request-available  (was: )

> Wrong semantics for null IN (empty list)
> 
>
> Key: SPARK-44550
> URL: https://issues.apache.org/jira/browse/SPARK-44550
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
>  Labels: pull-request-available
>
> {{null IN (empty list)}} incorrectly evaluates to null, when it should 
> evaluate to false. (The reason it should be false is because a IN (b1, b2) is 
> defined as a = b1 OR a = b2, and an empty IN list is treated as an empty OR 
> which is false. This is specified by ANSI SQL.)
> Many places in Spark execution (In, InSet, InSubquery) and optimization 
> (OptimizeIn, NullPropagation) implemented this wrong behavior. Also note that 
> the Spark behavior for the null IN (empty list) is inconsistent in some 
> places - literal IN lists generally return null (incorrect), while IN/NOT IN 
> subqueries mostly return false/true, respectively (correct) in this case.
> This is a longstanding correctness issue which has existed since null support 
> for IN expressions was first added to Spark.
> Doc with more details: 
> [https://docs.google.com/document/d/1k8AY8oyT-GI04SnP7eXttPDnDj-Ek-c3luF2zL6DPNU/edit]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42669) Short circuit local relation rpcs

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42669:
---
Labels: pull-request-available  (was: )

> Short circuit local relation rpcs
> -
>
> Key: SPARK-42669
> URL: https://issues.apache.org/jira/browse/SPARK-42669
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>  Labels: pull-request-available
>
> Operations on LocalRelation can mostly be done locally (without sending 
> rpcs). We should leverage this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42830) Link skipped stages on Spark UI

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42830:
---
Labels: pull-request-available  (was: )

> Link skipped stages on Spark UI
> ---
>
> Key: SPARK-42830
> URL: https://issues.apache.org/jira/browse/SPARK-42830
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
>  Labels: pull-request-available
>
> Add a link to the skipped Spark stages so that its easier to find the 
> execution details on the UI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42890) Add Identifier to the InMemoryTableScan node on the SQL page

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42890:
---
Labels: pull-request-available  (was: )

> Add Identifier to the InMemoryTableScan node on the SQL page
> 
>
> Key: SPARK-42890
> URL: https://issues.apache.org/jira/browse/SPARK-42890
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
>  Labels: pull-request-available
>
> On the SQL page in the Web UI, there is no distinction for which 
> InMemoryTableScan is being used at a specific point in the DAG. This Jira 
> aims to add a repeat identifier to distinguish which InMemoryTableScan is 
> being used at a certain location.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45057) Deadlock caused by rdd replication level of 2

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45057:
---
Labels: pull-request-available  (was: )

> Deadlock caused by rdd replication level of 2
> -
>
> Key: SPARK-45057
> URL: https://issues.apache.org/jira/browse/SPARK-45057
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Zhongwei Zhu
>Priority: Major
>  Labels: pull-request-available
>
>  
> When 2 tasks try to compute same rdd with replication level of 2 and running 
> on only 2 executors. Deadlock will happen.
> Task only release lock after writing into local machine and replicate to 
> remote executor.
>  
> ||Time||Exe 1 (Task Thread T1)||Exe 1 (Shuffle Server Thread T2)||Exe 2 (Task 
> Thread T3)||Exe 2 (Shuffle Server Thread T4)||
> |T0|write lock of rdd| | | |
> |T1| | |write lock of rdd| |
> |T2|replicate -> UploadBlockSync (blocked by T4)| | | |
> |T3| | | |Received UploadBlock request from T1 (blocked by T4)|
> |T4| | |replicate -> UploadBlockSync (blocked by T2)| |
> |T5| |Received UploadBlock request from T3 (blocked by T1)| | |
> |T6|Deadlock|Deadlock|Deadlock|Deadlock|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45285) Remove deprecated `Runtime.getRuntime.exec(String)` API usage

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45285:
-

Assignee: Dongjoon Hyun

> Remove deprecated `Runtime.getRuntime.exec(String)` API usage
> -
>
> Key: SPARK-45285
> URL: https://issues.apache.org/jira/browse/SPARK-45285
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45285) Remove deprecated `Runtime.getRuntime.exec(String)` API usage

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45285.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43062
[https://github.com/apache/spark/pull/43062]

> Remove deprecated `Runtime.getRuntime.exec(String)` API usage
> -
>
> Key: SPARK-45285
> URL: https://issues.apache.org/jira/browse/SPARK-45285
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45284) Update SparkR minimum SystemRequirements to Java 17

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45284:
-

Assignee: Dongjoon Hyun

> Update SparkR minimum SystemRequirements to Java 17
> ---
>
> Key: SPARK-45284
> URL: https://issues.apache.org/jira/browse/SPARK-45284
> Project: Spark
>  Issue Type: Sub-task
>  Components: R
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45284) Update SparkR minimum SystemRequirements to Java 17

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45284.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43060
[https://github.com/apache/spark/pull/43060]

> Update SparkR minimum SystemRequirements to Java 17
> ---
>
> Key: SPARK-45284
> URL: https://issues.apache.org/jira/browse/SPARK-45284
> Project: Spark
>  Issue Type: Sub-task
>  Components: R
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45265) Support Hive 4.0 metastore

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45265:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Bug)

> Support Hive 4.0 metastore
> --
>
> Key: SPARK-45265
> URL: https://issues.apache.org/jira/browse/SPARK-45265
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
>  Labels: pull-request-available
>
> Although Hive 4.0.0 is still beta I would like to work on this as Hive 4.0.0 
> will support support the pushdowns of partition column filters with 
> VARCHAR/CHAR types.
> For details please see HIVE-26661: Support partition filter for char and 
> varchar types on Hive metastore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45288) Remove outdated benchmark result files `jdk1[17]*results.txt`

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45288:
---
Labels: pull-request-available  (was: )

> Remove outdated benchmark result files `jdk1[17]*results.txt`
> -
>
> Key: SPARK-45288
> URL: https://issues.apache.org/jira/browse/SPARK-45288
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45288) Remove outdated benchmark result files `jdk1[17]*results.txt`

2023-09-22 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45288:
-

 Summary: Remove outdated benchmark result files 
`jdk1[17]*results.txt`
 Key: SPARK-45288
 URL: https://issues.apache.org/jira/browse/SPARK-45288
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45287) Add Java 21 benchmark result

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45287:
---
Labels: pull-request-available  (was: )

> Add Java 21 benchmark result
> 
>
> Key: SPARK-45287
> URL: https://issues.apache.org/jira/browse/SPARK-45287
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45287) Add Java 21 benchmark result

2023-09-22 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45287:
-

 Summary: Add Java 21 benchmark result
 Key: SPARK-45287
 URL: https://issues.apache.org/jira/browse/SPARK-45287
 Project: Spark
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45265) Support Hive 4.0 metastore

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45265:
---
Labels: pull-request-available  (was: )

> Support Hive 4.0 metastore
> --
>
> Key: SPARK-45265
> URL: https://issues.apache.org/jira/browse/SPARK-45265
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
>  Labels: pull-request-available
>
> Although Hive 4.0.0 is still beta I would like to work on this as Hive 4.0.0 
> will support support the pushdowns of partition column filters with 
> VARCHAR/CHAR types.
> For details please see HIVE-26661: Support partition filter for char and 
> varchar types on Hive metastore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43288) DataSourceV2: CREATE TABLE LIKE

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43288:
---
Labels: pull-request-available  (was: )

> DataSourceV2: CREATE TABLE LIKE
> ---
>
> Key: SPARK-43288
> URL: https://issues.apache.org/jira/browse/SPARK-43288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: John Zhuge
>Priority: Major
>  Labels: pull-request-available
>
> Support CREATE TABLE LIKE in DSv2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39822) Provides a good error during create Index with different dtype elements

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-39822:
---
Labels: pull-request-available  (was: )

> Provides a good error during create Index with different dtype elements
> ---
>
> Key: SPARK-39822
> URL: https://issues.apache.org/jira/browse/SPARK-39822
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.2
>Reporter: bo zhao
>Priority: Minor
>  Labels: pull-request-available
>
> PANDAS
>  
> {code:java}
> >>> import pandas as pd >>> pd.Index([1,2,'3',4]) Index([1, 2, '3', 4], 
> >>> dtype='object') >>> 
>  {code}
> PYSPARK
>  
>  
> {code:java}
> Using Python version 3.8.13 (default, Jun 29 2022 11:50:19)
> Spark context Web UI available at http://172.25.179.45:4042
> Spark context available as 'sc' (master = local[*], app id = 
> local-1658301116572).
> SparkSession available as 'spark'.
> >>> from pyspark import pandas as ps
> WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It 
> is required to set this environment variable to '1' in both driver and 
> executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you 
> but it does not work if there is a Spark context already launched.
> >>> ps.Index([1,2,'3',4])
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/home/spark/spark/python/pyspark/pandas/indexes/base.py", line 184, 
> in __new__
>     ps.from_pandas(
>   File "/home/spark/spark/python/pyspark/pandas/namespace.py", line 155, in 
> from_pandas
>     return DataFrame(pd.DataFrame(index=pobj)).index
>   File "/home/spark/spark/python/pyspark/pandas/frame.py", line 463, in 
> __init__
>     internal = InternalFrame.from_pandas(pdf)
>   File "/home/spark/spark/python/pyspark/pandas/internal.py", line 1469, in 
> from_pandas
>     ) = InternalFrame.prepare_pandas_frame(pdf, 
> prefer_timestamp_ntz=prefer_timestamp_ntz)
>   File "/home/spark/spark/python/pyspark/pandas/internal.py", line 1570, in 
> prepare_pandas_frame
>     spark_type = infer_pd_series_spark_type(reset_index[col], dtype, 
> prefer_timestamp_ntz)
>   File "/home/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 360, in infer_pd_series_spark_type
>     return from_arrow_type(pa.Array.from_pandas(pser).type, 
> prefer_timestamp_ntz)
>   File "pyarrow/array.pxi", line 1033, in pyarrow.lib.Array.from_pandas
>   File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Could not convert '3' with type str: tried to 
> convert to int64
>  {code}
> I understand that pyspark pandas need the dtype to be the same, but we need a 
> good error msg or something to tell the user how to avoid.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45286) Add back Matomo analytics to release docs

2023-09-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-45286:
-
Target Version/s: 3.4.2, 4.0.0, 3.5.1  (was: 4.0.0)

> Add back Matomo analytics to release docs
> -
>
> Key: SPARK-45286
> URL: https://issues.apache.org/jira/browse/SPARK-45286
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
>  Labels: pull-request-available
>
> We had previously removed Google Analytics from the website and release docs, 
> per ASF policy: https://github.com/apache/spark/pull/36310
> We just restored analytics using the ASF-hosted Matomo service on the website:
> https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30
> This change would put the same new tracking code back into the release docs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45286) Add back Matomo analytics to release docs

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45286:
---
Labels: pull-request-available  (was: )

> Add back Matomo analytics to release docs
> -
>
> Key: SPARK-45286
> URL: https://issues.apache.org/jira/browse/SPARK-45286
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
>  Labels: pull-request-available
>
> We had previously removed Google Analytics from the website and release docs, 
> per ASF policy: https://github.com/apache/spark/pull/36310
> We just restored analytics using the ASF-hosted Matomo service on the website:
> https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30
> This change would put the same new tracking code back into the release docs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45273) Http header Attack【HttpSecurityFilter】

2023-09-22 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768144#comment-17768144
 ] 

Sean R. Owen commented on SPARK-45273:
--

Yep we typically evaluate security reports on priv...@spark.apache.org first, 
not here

> Http header Attack【HttpSecurityFilter】
> --
>
> Key: SPARK-45273
> URL: https://issues.apache.org/jira/browse/SPARK-45273
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: chenyu
>Priority: Major
>
> There is an HTTP host header attack vulnerability in the target URL



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45273) Http header Attack【HttpSecurityFilter】

2023-09-22 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-45273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768140#comment-17768140
 ] 

Bjørn Jørgensen commented on SPARK-45273:
-

Hi, [~chenyu-opensource] can you take this on mail to secur...@spark.apache.org 
CC [~srowen]

> Http header Attack【HttpSecurityFilter】
> --
>
> Key: SPARK-45273
> URL: https://issues.apache.org/jira/browse/SPARK-45273
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: chenyu
>Priority: Major
>
> There is an HTTP host header attack vulnerability in the target URL



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45286) Add back Matomo analytics to release docs

2023-09-22 Thread Sean R. Owen (Jira)

Sean R. Owen created SPARK-45286:


 Summary: Add back Matomo analytics to release docs
 Key: SPARK-45286
 URL: https://issues.apache.org/jira/browse/SPARK-45286
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Sean R. Owen
Assignee: Sean R. Owen


We had previously removed Google Analytics from the website and release docs, 
per ASF policy: https://github.com/apache/spark/pull/36310

We just restored analytics using the ASF-hosted Matomo service on the website:
https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30

This change would put the same new tracking code back into the release docs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-45282) Join loses records for cached datasets

2023-09-22 Thread koert kuipers (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768123#comment-17768123
 ] 

koert kuipers edited comment on SPARK-45282 at 9/22/23 7:04 PM:


after reverting SPARK-41048 the issue went away.


was (Author: koert):
after reverting SPARK-41048 the issue went away. so i think this is the cause.

> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or 
> databricks 13.3
>Reporter: koert kuipers
>Priority: Major
>  Labels: CorrectnessBug, correctness
>
> we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
> not present on spark 3.3.1.
> it only shows up in distributed environment. i cannot replicate in unit test. 
> however i did get it to show up on hadoop cluster, kubernetes, and on 
> databricks 13.3
> the issue is that records are dropped when two cached dataframes are joined. 
> it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an 
> optimization while in spark 3.3.1 these Exhanges are still present. it seems 
> to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true.
> to reproduce on distributed cluster these settings needed are:
> {code:java}
> spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
> spark.sql.adaptive.coalescePartitions.parallelismFirst false
> spark.sql.adaptive.enabled true
> spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
> code using scala to reproduce is:
> {code:java}
> import java.util.UUID
> import org.apache.spark.sql.functions.col
> import spark.implicits._
> val data = (1 to 100).toDS().map(i => 
> UUID.randomUUID().toString).persist()
> val left = data.map(k => (k, 1))
> val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
> println("number of left " + left.count())
> println("number of right " + right.count())
> println("number of (left join right) " +
>   left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count()
> )
> val left1 = left
>   .toDF("key", "value1")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of left1 " + left1.count())
> val right1 = right
>   .toDF("key", "value2")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of right1 " + right1.count())
> println("number of (left1 join right1) " +  left1.join(right1, 
> "key").count()) // this gives incorrect result{code}
> this produces the following output:
> {code:java}
> number of left 100
> number of right 100
> number of (left join right) 100
> number of left1 100
> number of right1 100
> number of (left1 join right1) 859531 {code}
> note that the last number (the incorrect one) actually varies depending on 
> settings and cluster size etc.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45282) Join loses records for cached datasets

2023-09-22 Thread koert kuipers (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

koert kuipers updated SPARK-45282:
--
Description: 
we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
not present on spark 3.3.1.

it only shows up in distributed environment. i cannot replicate in unit test. 
however i did get it to show up on hadoop cluster, kubernetes, and on 
databricks 13.3

the issue is that records are dropped when two cached dataframes are joined. it 
seems in spark 3.4.1 in queryplan some Exchanges are dropped as an optimization 
while in spark 3.3.1 these Exhanges are still present. it seems to be an issue 
with AQE with canChangeCachedPlanOutputPartitioning=true.

to reproduce on distributed cluster these settings needed are:
{code:java}
spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
spark.sql.adaptive.coalescePartitions.parallelismFirst false
spark.sql.adaptive.enabled true
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
code using scala to reproduce is:
{code:java}
import java.util.UUID
import org.apache.spark.sql.functions.col

import spark.implicits._

val data = (1 to 100).toDS().map(i => UUID.randomUUID().toString).persist()

val left = data.map(k => (k, 1))
val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
println("number of left " + left.count())
println("number of right " + right.count())
println("number of (left join right) " +
  left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count()
)

val left1 = left
  .toDF("key", "value1")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of left1 " + left1.count())

val right1 = right
  .toDF("key", "value2")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of right1 " + right1.count())

println("number of (left1 join right1) " +  left1.join(right1, "key").count()) 
// this gives incorrect result{code}
this produces the following output:
{code:java}
number of left 100
number of right 100
number of (left join right) 100
number of left1 100
number of right1 100
number of (left1 join right1) 859531 {code}
note that the last number (the incorrect one) actually varies depending on 
settings and cluster size etc.

 

  was:
we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
not present on spark 3.3.1.

it only shows up in distributed environment. i cannot replicate in unit test. 
however i did get it to show up on hadoop cluster, kubernetes, and on 
databricks 13.3

the issue is that records are dropped when two cached dataframes are joined. it 
seems in spark 3.4.1 in queryplan some Exchanges are dropped as an optimization 
while in spark 3.3.1 these Exhanges are still present. it seems to be an issue 
with AQE with canChangeCachedPlanOutputPartitioning=true.

to reproduce on distributed cluster these settings needed are:
{code:java}
spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
spark.sql.adaptive.coalescePartitions.parallelismFirst false
spark.sql.adaptive.enabled true
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
code using scala to reproduce is:
{code:java}
import java.util.UUID
import org.apache.spark.sql.functions.col

import spark.implicits._

val data = (1 to 100).toDS().map(i => UUID.randomUUID().toString).persist()

val left = data.map(k => (k, 1))
val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
println("number of left " + left.count())
println("number of right " + right.count())
println("number of (left join right) " +
  left.toDF("key", "vertex").join(right.toDF("key", "state"), "key").count()
)

val left1 = left
  .toDF("key", "vertex")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of left1 " + left1.count())

val right1 = right
  .toDF("key", "state")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of right1 " + right1.count())

println("number of (left1 join right1) " +  left1.join(right1, "key").count()) 
// this gives incorrect result{code}
this produces the following output:
{code:java}
number of left 100
number of right 100
number of (left join right) 100
number of left1 100
number of right1 100
number of (left1 join right1) 859531 {code}
note that the last number (the incorrect one) actually varies depending on 
settings and cluster size etc.

 


> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or

[jira] [Updated] (SPARK-45285) Remove deprecated `Runtime.getRuntime.exec(String)` API usage

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45285:
---
Labels: pull-request-available  (was: )

> Remove deprecated `Runtime.getRuntime.exec(String)` API usage
> -
>
> Key: SPARK-45285
> URL: https://issues.apache.org/jira/browse/SPARK-45285
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45285) Remove deprecated `Runtime.getRuntime.exec(String)` API usage

2023-09-22 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45285:
-

 Summary: Remove deprecated `Runtime.getRuntime.exec(String)` API 
usage
 Key: SPARK-45285
 URL: https://issues.apache.org/jira/browse/SPARK-45285
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45282) Join loses records for cached datasets

2023-09-22 Thread koert kuipers (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

koert kuipers updated SPARK-45282:
--
Description: 
we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
not present on spark 3.3.1.

it only shows up in distributed environment. i cannot replicate in unit test. 
however i did get it to show up on hadoop cluster, kubernetes, and on 
databricks 13.3

the issue is that records are dropped when two cached dataframes are joined. it 
seems in spark 3.4.1 in queryplan some Exchanges are dropped as an optimization 
while in spark 3.3.1 these Exhanges are still present. it seems to be an issue 
with AQE with canChangeCachedPlanOutputPartitioning=true.

to reproduce on distributed cluster these settings needed are:
{code:java}
spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
spark.sql.adaptive.coalescePartitions.parallelismFirst false
spark.sql.adaptive.enabled true
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
code using scala to reproduce is:
{code:java}
import java.util.UUID
import org.apache.spark.sql.functions.col

import spark.implicits._

val data = (1 to 100).toDS().map(i => UUID.randomUUID().toString).persist()

val left = data.map(k => (k, 1))
val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
println("number of left " + left.count())
println("number of right " + right.count())
println("number of (left join right) " +
  left.toDF("key", "vertex").join(right.toDF("key", "state"), "key").count()
)

val left1 = left
  .toDF("key", "vertex")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of left1 " + left1.count())

val right1 = right
  .toDF("key", "state")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of right1 " + right1.count())

println("number of (left1 join right1) " +  left1.join(right1, "key").count()) 
// this gives incorrect result{code}
this produces the following output:
{code:java}
number of left 100
number of right 100
number of (left join right) 100
number of left1 100
number of right1 100
number of (left1 join right1) 859531 {code}
note that the last number (the incorrect one) actually varies depending on 
settings and cluster size etc.

 

  was:
we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
not present on spark 3.3.1.

it only shows up in distributed environment. i cannot replicate in unit test. 
however i did get it to show up on hadoop cluster, kubernetes, and on 
databricks 13.3

the issue is that records are dropped when two cached dataframes are joined. it 
seems in spark 3.4.1 in queryplan some Exchanges are dropped as an optimization 
while in spark 3.3.1 these Exhanges are still present. it seems to be an issue 
with AQE with canChangeCachedPlanOutputPartitioning=true.

to reproduce on distributed cluster these settings needed are:
{code:java}
spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
spark.sql.adaptive.coalescePartitions.parallelismFirst false
spark.sql.adaptive.enabled true
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
code using scala 2.13 to reproduce is:
{code:java}
import java.util.UUID
import org.apache.spark.sql.functions.col

import spark.implicits._

val data = (1 to 100).toDS().map(i => UUID.randomUUID().toString).persist()

val left = data.map(k => (k, 1))
val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
println("number of left " + left.count())
println("number of right " + right.count())
println("number of (left join right) " +
  left.toDF("key", "vertex").join(right.toDF("key", "state"), "key").count()
)

val left1 = left
  .toDF("key", "vertex")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of left1 " + left1.count())

val right1 = right
  .toDF("key", "state")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of right1 " + right1.count())

println("number of (left1 join right1) " +  left1.join(right1, "key").count()) 
// this gives incorrect result{code}
this produces the following output:
{code:java}
number of left 100
number of right 100
number of (left join right) 100
number of left1 100
number of right1 100
number of (left1 join right1) 859531 {code}
note that the last number (the incorrect one) actually varies depending on 
settings and cluster size etc.

 


> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or

[jira] [Commented] (SPARK-45282) Join loses records for cached datasets

2023-09-22 Thread koert kuipers (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768123#comment-17768123
 ] 

koert kuipers commented on SPARK-45282:
---

after reverting SPARK-41048 the issue went away. so i think this is the cause.

> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or 
> databricks 13.3
>Reporter: koert kuipers
>Priority: Major
>  Labels: CorrectnessBug, correctness
>
> we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
> not present on spark 3.3.1.
> it only shows up in distributed environment. i cannot replicate in unit test. 
> however i did get it to show up on hadoop cluster, kubernetes, and on 
> databricks 13.3
> the issue is that records are dropped when two cached dataframes are joined. 
> it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an 
> optimization while in spark 3.3.1 these Exhanges are still present. it seems 
> to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true.
> to reproduce on distributed cluster these settings needed are:
> {code:java}
> spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
> spark.sql.adaptive.coalescePartitions.parallelismFirst false
> spark.sql.adaptive.enabled true
> spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
> code using scala 2.13 to reproduce is:
> {code:java}
> import java.util.UUID
> import org.apache.spark.sql.functions.col
> import spark.implicits._
> val data = (1 to 100).toDS().map(i => 
> UUID.randomUUID().toString).persist()
> val left = data.map(k => (k, 1))
> val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
> println("number of left " + left.count())
> println("number of right " + right.count())
> println("number of (left join right) " +
>   left.toDF("key", "vertex").join(right.toDF("key", "state"), "key").count()
> )
> val left1 = left
>   .toDF("key", "vertex")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of left1 " + left1.count())
> val right1 = right
>   .toDF("key", "state")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of right1 " + right1.count())
> println("number of (left1 join right1) " +  left1.join(right1, 
> "key").count()) // this gives incorrect result{code}
> this produces the following output:
> {code:java}
> number of left 100
> number of right 100
> number of (left join right) 100
> number of left1 100
> number of right1 100
> number of (left1 join right1) 859531 {code}
> note that the last number (the incorrect one) actually varies depending on 
> settings and cluster size etc.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42768) Enable cached plan apply AQE by default

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42768:
---
Labels: pull-request-available  (was: )

> Enable cached plan apply AQE by default
> ---
>
> Key: SPARK-42768
> URL: https://issues.apache.org/jira/browse/SPARK-42768
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45284) Update SparkR minimum SystemRequirements to Java 17

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45284:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Improvement)

> Update SparkR minimum SystemRequirements to Java 17
> ---
>
> Key: SPARK-45284
> URL: https://issues.apache.org/jira/browse/SPARK-45284
> Project: Spark
>  Issue Type: Sub-task
>  Components: R
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45284) Update SparkR minimum SystemRequirements to Java 17

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45284:
---
Labels: pull-request-available  (was: )

> Update SparkR minimum SystemRequirements to Java 17
> ---
>
> Key: SPARK-45284
> URL: https://issues.apache.org/jira/browse/SPARK-45284
> Project: Spark
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45077) Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45077:
---
Labels: pull-request-available  (was: )

> Upgrade dagre-d3.js from 0.4.3 to 0.6.4
> ---
>
> Key: SPARK-45077
> URL: https://issues.apache.org/jira/browse/SPARK-45077
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45284) Update SparkR minimum +SystemRequirements to Java 17

2023-09-22 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45284:
-

 Summary: Update SparkR minimum +SystemRequirements to Java 17
 Key: SPARK-45284
 URL: https://issues.apache.org/jira/browse/SPARK-45284
 Project: Spark
  Issue Type: Improvement
  Components: R
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45284) Update SparkR minimum SystemRequirements to Java 17

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45284:
--
Summary: Update SparkR minimum SystemRequirements to Java 17  (was: Update 
SparkR minimum +SystemRequirements to Java 17)

> Update SparkR minimum SystemRequirements to Java 17
> ---
>
> Key: SPARK-45284
> URL: https://issues.apache.org/jira/browse/SPARK-45284
> Project: Spark
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45282) Join loses records for cached datasets

2023-09-22 Thread koert kuipers (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

koert kuipers updated SPARK-45282:
--
Description: 
we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
not present on spark 3.3.1.

it only shows up in distributed environment. i cannot replicate in unit test. 
however i did get it to show up on hadoop cluster, kubernetes, and on 
databricks 13.3

the issue is that records are dropped when two cached dataframes are joined. it 
seems in spark 3.4.1 in queryplan some Exchanges are dropped as an optimization 
while in spark 3.3.1 these Exhanges are still present. it seems to be an issue 
with AQE with canChangeCachedPlanOutputPartitioning=true.

to reproduce on distributed cluster these settings needed are:
{code:java}
spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
spark.sql.adaptive.coalescePartitions.parallelismFirst false
spark.sql.adaptive.enabled true
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
code using scala 2.13 to reproduce is:
{code:java}
import java.util.UUID
import org.apache.spark.sql.functions.col

import spark.implicits._

val data = (1 to 100).toDS().map(i => UUID.randomUUID().toString).persist()

val left = data.map(k => (k, 1))
val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
println("number of left " + left.count())
println("number of right " + right.count())
println("number of (left join right) " +
  left.toDF("key", "vertex").join(right.toDF("key", "state"), "key").count()
)

val left1 = left
  .toDF("key", "vertex")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of left1 " + left1.count())

val right1 = right
  .toDF("key", "state")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of right1 " + right1.count())

println("number of (left1 join right1) " +  left1.join(right1, "key").count()) 
// this gives incorrect result{code}
this produces the following output:
{code:java}
number of left 100
number of right 100
number of (left join right) 100
number of left1 100
number of right1 100
number of (left1 join right1) 859531 {code}
note that the last number (the incorrect one) actually varies depending on 
settings and cluster size etc.

 

  was:
we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
not present on spark 3.3.1.

it only shows up in distributed environment. i cannot replicate in unit test. 
however i did get it to show up on hadoop cluster, kubernetes, and on 
databricks 13.3

the issue is that records are dropped when two cached dataframes are joined. it 
seems in spark 3.4.1 in queryplan some Exchanges are dropped as an optimization 
while in spark 3.3.1 these Exhanges are still present. it seems to be an issue 
with AQE with canChangeCachedPlanOutputPartitioning=true.

to reproduce on distributed cluster these settings needed are:
{code:java}
spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
spark.sql.adaptive.coalescePartitions.parallelismFirst false
spark.sql.adaptive.enabled true
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
code to reproduce is:
{code:java}
import java.util.UUID
import org.apache.spark.sql.functions.col

import spark.implicits._

val data = (1 to 100).toDS().map(i => UUID.randomUUID().toString).persist()

val left = data.map(k => (k, 1))
val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
println("number of left " + left.count())
println("number of right " + right.count())
println("number of (left join right) " +
  left.toDF("key", "vertex").join(right.toDF("key", "state"), "key").count()
)

val left1 = left
  .toDF("key", "vertex")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of left1 " + left1.count())

val right1 = right
  .toDF("key", "state")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of right1 " + right1.count())

println("number of (left1 join right1) " +  left1.join(right1, "key").count()) 
// this gives incorrect result{code}
this produces the following output:
{code:java}
number of left 100
number of right 100
number of (left join right) 100
number of left1 100
number of right1 100
number of (left1 join right1) 859531 {code}
note that the last number (the incorrect one) actually varies depending on 
settings and cluster size etc.

 


> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes

[jira] [Updated] (SPARK-45282) Join loses records for cached datasets

2023-09-22 Thread koert kuipers (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

koert kuipers updated SPARK-45282:
--
Description: 
we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
not present on spark 3.3.1.

it only shows up in distributed environment. i cannot replicate in unit test. 
however i did get it to show up on hadoop cluster, kubernetes, and on 
databricks 13.3

the issue is that records are dropped when two cached dataframes are joined. it 
seems in spark 3.4.1 in queryplan some Exchanges are dropped as an optimization 
while in spark 3.3.1 these Exhanges are still present. it seems to be an issue 
with AQE with canChangeCachedPlanOutputPartitioning=true.

to reproduce on distributed cluster these settings needed are:
{code:java}
spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
spark.sql.adaptive.coalescePartitions.parallelismFirst false
spark.sql.adaptive.enabled true
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
code to reproduce is:
{code:java}
import java.util.UUID
import org.apache.spark.sql.functions.col

import spark.implicits._

val data = (1 to 100).toDS().map(i => UUID.randomUUID().toString).persist()

val left = data.map(k => (k, 1))
val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
println("number of left " + left.count())
println("number of right " + right.count())
println("number of (left join right) " +
  left.toDF("key", "vertex").join(right.toDF("key", "state"), "key").count()
)

val left1 = left
  .toDF("key", "vertex")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of left1 " + left1.count())

val right1 = right
  .toDF("key", "state")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of right1 " + right1.count())

println("number of (left1 join right1) " +  left1.join(right1, "key").count()) 
// this gives incorrect result{code}
this produces the following output:
{code:java}
number of left 100
number of right 100
number of (left join right) 100
number of left1 100
number of right1 100
number of (left1 join right1) 859531 {code}
note that the last number (the incorrect one) actually varies depending on 
settings and cluster size etc.

 

  was:
we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
not present on spark 3.3.1.

it only shows up in distributed environment. i cannot replicate in unit test. 
however i did get it to show up on hadoop cluster, kubernetes, and on 
databricks 13.3

the issue is that records are dropped when two cached dataframes are joined. it 
seems in spark 3.4.1 in queryplan some Exchanges are dropped as an optimization 
while in spark 3.3.1 these Exhanges are still present. it seems to be an issue 
with AQE with canChangeCachedPlanOutputPartitioning=true.

to reproduce on distributed cluster these settings needed are:
{code:java}
spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
spark.sql.adaptive.coalescePartitions.parallelismFirst false
spark.sql.adaptive.enabled true
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
code to reproduce is:
{code:java}
import java.util.UUID
import org.apache.spark.sql.functions.col

import spark.implicits._

val data = (1 to 100).toDS().map(i => UUID.randomUUID().toString).persist()

val left = data.map(k => (k, 1))
val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
println("number of left " + left.count())
println("number of right " + right.count())
println("number of (left join right) " +
  left.toDF("key", "vertex").join(right.toDF("key", "state"), "key").count()
)

val left1 = left
  .toDF("key", "vertex")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of left1 " + left1.count())

val right1 = right
  .toDF("key", "state")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of right1 " + right1.count())

println("number of (left1 join right1) " +  left1.join(right1, "key").count()) 
// this gives incorrect result{code}
 

 


> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or 
> databricks 13.3
>Reporter: koert kuipers
>Priority: Major
>  Labels: CorrectnessBug, correctness
>
> we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
> not present on spark 3.3.1.
> it only shows up in distributed environment. i cannot replicate in unit test. 
> however

[jira] [Created] (SPARK-45283) Make StatusTrackerSuite less fragile

2023-09-22 Thread Bo Xiong (Jira)

Bo Xiong created SPARK-45283:


 Summary: Make StatusTrackerSuite less fragile
 Key: SPARK-45283
 URL: https://issues.apache.org/jira/browse/SPARK-45283
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Tests
Affects Versions: 3.5.0, 4.0.0
Reporter: Bo Xiong


It's discovered from [Github 
Actions|https://github.com/xiongbo-sjtu/spark/actions/runs/6270601155/job/17028788767]
 that StatusTrackerSuite can run into random failures because 
FutureAction.jobIds is not a sorted sequence (by design), as shown in the 
following stack trace (highlighted in red).  The proposed fix is to update the 
unit test to remove the nondeterministic behavior.
{quote}[info] StatusTrackerSuite:
[info] - basic status API usage (99 milliseconds)
[info] - getJobIdsForGroup() (56 milliseconds)
[info] - getJobIdsForGroup() with takeAsync() (48 milliseconds)
[info] - getJobIdsForGroup() with takeAsync() across multiple partitions (58 
milliseconds)
[info] - getJobIdsForTag() *** FAILED *** (10 seconds, 77 milliseconds)
{color:#FF}[info] The code passed to eventually never returned normally. 
Attempted 651 times over 10.00505994401 seconds. Last failure message: 
Set(3, 2, 1) was not equal to Set(1, 2). (StatusTrackerSuite.scala:148){color}
[info] org.scalatest.exceptions.TestFailedDueToTimeoutException:
[info] at 
org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:219)
[info] at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226)
[info] at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:348)
[info] at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:347)
[info] at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:457)
[info] at 
org.apache.spark.StatusTrackerSuite.$anonfun$new$21(StatusTrackerSuite.scala:148)
[info] at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
[info] at 
org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
[info] at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
[info] at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
[info] at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
[info] at 
org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info] at org.scalatest.Transformer.apply(Transformer.scala:22)
[info] at org.scalatest.Transformer.apply(Transformer.scala:20)
[info] at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)
[info] at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
[info] at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info] at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
[info] at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
[info] at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69)
[info] at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
[info] at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
[info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69)
[info] at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
[info] at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info] at scala.collection.immutable.List.foreach(List.scala:333)
[info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
[info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
[info] at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
[info] at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
[info] at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
[info] at org.scalatest.Suite.run(Suite.scala:1114)
[info] at org.scalatest.Suite.run$(Suite.scala:1096)
[info] at 
org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
[info] at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
[info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
[info] at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
[info] at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
[info] at

[jira] [Updated] (SPARK-45282) Join loses records for cached datasets

2023-09-22 Thread koert kuipers (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

koert kuipers updated SPARK-45282:
--
Description: 
we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
not present on spark 3.3.1.

it only shows up in distributed environment. i cannot replicate in unit test. 
however i did get it to show up on hadoop cluster, kubernetes, and on 
databricks 13.3

the issue is that records are dropped when two cached dataframes are joined. it 
seems in spark 3.4.1 in queryplan some Exchanges are dropped as an optimization 
while in spark 3.3.1 these Exhanges are still present. it seems to be an issue 
with AQE with canChangeCachedPlanOutputPartitioning=true.

to reproduce on distributed cluster these settings needed are:
{code:java}
spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
spark.sql.adaptive.coalescePartitions.parallelismFirst false
spark.sql.adaptive.enabled true
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
code to reproduce is:
{code:java}
import java.util.UUID
import org.apache.spark.sql.functions.col

import spark.implicits._

val data = (1 to 100).toDS().map(i => UUID.randomUUID().toString).persist()

val left = data.map(k => (k, 1))
val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
println("number of left " + left.count())
println("number of right " + right.count())
println("number of (left join right) " +
  left.toDF("key", "vertex").join(right.toDF("key", "state"), "key").count()
)

val left1 = left
  .toDF("key", "vertex")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of left1 " + left1.count())

val right1 = right
  .toDF("key", "state")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of right1 " + right1.count())

println("number of (left1 join right1) " +  left1.join(right1, "key").count()) 
// this gives incorrect result{code}
 

 

  was:
we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
not present on spark 3.3.1.

it only shows up in distributed environment. i cannot replicate in unit test. 
however i did get it to show up on hadoop cluster, kubernetes, and on 
databricks 13.3

the issue is that records are dropped when two cached dataframes are joined. it 
seems in spark 3.4.1 in queryplan some Exchanges are dropped as an optimization 
while in spark 3.3.1 these Exhanges are still present. it seems to be an issue 
with AQE with canChangeCachedPlanOutputPartitioning=true.

to reproduce on distributed cluster these settings needed are:

 
{code:java}
spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
spark.sql.adaptive.coalescePartitions.parallelismFirst false
spark.sql.adaptive.enabled true
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
code to reproduce is:

 

 
{code:java}
import java.util.UUID
import org.apache.spark.sql.functions.col

import spark.implicits._

val data = (1 to 100).toDS().map(i => UUID.randomUUID().toString).persist()

val left = data.map(k => (k, 1))
val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
println("number of left " + left.count())
println("number of right " + right.count())
println("number of (left join right) " +
  left.toDF("key", "vertex").join(right.toDF("key", "state"), "key").count()
)

val left1 = left
  .toDF("key", "vertex")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of left1 " + left1.count())

val right1 = right
  .toDF("key", "state")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of right1 " + right1.count())

println("number of (left1 join right1) " +  left1.join(right1, "key").count()) 
// this gives incorrect result{code}
 

 


> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or 
> databricks 13.3
>Reporter: koert kuipers
>Priority: Major
>  Labels: CorrectnessBug, correctness
>
> we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
> not present on spark 3.3.1.
> it only shows up in distributed environment. i cannot replicate in unit test. 
> however i did get it to show up on hadoop cluster, kubernetes, and on 
> databricks 13.3
> the issue is that records are dropped when two cached dataframes are joined. 
> it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an 
> optimization while in spark 3.3.1 these Exhanges are still present. it seems 
> to

[jira] [Created] (SPARK-45282) Join loses records for cached datasets

2023-09-22 Thread koert kuipers (Jira)

koert kuipers created SPARK-45282:
-

 Summary: Join loses records for cached datasets
 Key: SPARK-45282
 URL: https://issues.apache.org/jira/browse/SPARK-45282
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 3.4.1
 Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or 
databricks 13.3
Reporter: koert kuipers


we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
not present on spark 3.3.1.

it only shows up in distributed environment. i cannot replicate in unit test. 
however i did get it to show up on hadoop cluster, kubernetes, and on 
databricks 13.3

the issue is that records are dropped when two cached dataframes are joined. it 
seems in spark 3.4.1 in queryplan some Exchanges are dropped as an optimization 
while in spark 3.3.1 these Exhanges are still present. it seems to be an issue 
with AQE with canChangeCachedPlanOutputPartitioning=true.

to reproduce on distributed cluster these settings needed are:

 
{code:java}
spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
spark.sql.adaptive.coalescePartitions.parallelismFirst false
spark.sql.adaptive.enabled true
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
code to reproduce is:

 

 
{code:java}
import java.util.UUID
import org.apache.spark.sql.functions.col

import spark.implicits._

val data = (1 to 100).toDS().map(i => UUID.randomUUID().toString).persist()

val left = data.map(k => (k, 1))
val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
println("number of left " + left.count())
println("number of right " + right.count())
println("number of (left join right) " +
  left.toDF("key", "vertex").join(right.toDF("key", "state"), "key").count()
)

val left1 = left
  .toDF("key", "vertex")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of left1 " + left1.count())

val right1 = right
  .toDF("key", "state")
  .repartition(col("key")) // comment out this line to make it work
  .persist()
println("number of right1 " + right1.count())

println("number of (left1 join right1) " +  left1.join(right1, "key").count()) 
// this gives incorrect result{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45281) Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45281.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43059
[https://github.com/apache/spark/pull/43059]

> Update BenchmarkBase to use Java 17 as the base version
> ---
>
> Key: SPARK-45281
> URL: https://issues.apache.org/jira/browse/SPARK-45281
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45281) Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45281:
-

Assignee: Dongjoon Hyun

> Update BenchmarkBase to use Java 17 as the base version
> ---
>
> Key: SPARK-45281
> URL: https://issues.apache.org/jira/browse/SPARK-45281
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45277) Install Java 17 for Windows SparkR test

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45277:
-

Assignee: Yang Jie

> Install Java 17 for Windows SparkR test
> ---
>
> Key: SPARK-45277
> URL: https://issues.apache.org/jira/browse/SPARK-45277
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45277) Install Java 17 for Windows SparkR test

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45277.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43056
[https://github.com/apache/spark/pull/43056]

> Install Java 17 for Windows SparkR test
> ---
>
> Key: SPARK-45277
> URL: https://issues.apache.org/jira/browse/SPARK-45277
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45281) Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45281:
---
Labels: pull-request-available  (was: )

> Update BenchmarkBase to use Java 17 as the base version
> ---
>
> Key: SPARK-45281
> URL: https://issues.apache.org/jira/browse/SPARK-45281
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45281) Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45281:
-

 Summary: Update BenchmarkBase to use Java 17 as the base version
 Key: SPARK-45281
 URL: https://issues.apache.org/jira/browse/SPARK-45281
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45256) Arrow DurationWriter fails when vector is at capacity

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45256.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43035
[https://github.com/apache/spark/pull/43035]

> Arrow DurationWriter fails when vector is at capacity
> -
>
> Key: SPARK-45256
> URL: https://issues.apache.org/jira/browse/SPARK-45256
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 3.4.0, 3.4.1, 3.5.0, 3.5.1
>Reporter: Sander Goos
>Assignee: Sander Goos
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The DurationWriter fails if more values are written than the initial capacity 
> of the DurationVector (4032). Fix by using `setSafe` instead of `set` method. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45256) Arrow DurationWriter fails when vector is at capacity

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45256:
-

Assignee: Sander Goos

> Arrow DurationWriter fails when vector is at capacity
> -
>
> Key: SPARK-45256
> URL: https://issues.apache.org/jira/browse/SPARK-45256
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 3.4.0, 3.4.1, 3.5.0, 3.5.1
>Reporter: Sander Goos
>Assignee: Sander Goos
>Priority: Major
>  Labels: pull-request-available
>
> The DurationWriter fails if more values are written than the initial capacity 
> of the DurationVector (4032). Fix by using `setSafe` instead of `set` method. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45280) Change Maven daily test use Java 17 for testing.

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45280:
-

Assignee: Yang Jie

> Change Maven daily test use Java 17 for testing.
> 
>
> Key: SPARK-45280
> URL: https://issues.apache.org/jira/browse/SPARK-45280
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45280) Change Maven daily test use Java 17 for testing.

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45280.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43057
[https://github.com/apache/spark/pull/43057]

> Change Maven daily test use Java 17 for testing.
> 
>
> Key: SPARK-45280
> URL: https://issues.apache.org/jira/browse/SPARK-45280
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36321) Do not fail application in kubernetes if name is too long

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-36321:
---
Labels: pull-request-available  (was: )

> Do not fail application in kubernetes if name is too long
> -
>
> Key: SPARK-36321
> URL: https://issues.apache.org/jira/browse/SPARK-36321
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>  Labels: pull-request-available
>
> If we have a long spark app name and start with k8s master, we will get the 
> execption.
> {code:java}
> java.lang.IllegalArgumentException: 
> 'a-89fe2f7ae71c3570' in 
> spark.kubernetes.executor.podNamePrefix is invalid. must conform 
> https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names
>  and the value length <= 47
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108)
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214)
>   at org.apache.spark.SparkConf.get(SparkConf.scala:261)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67)
>   at 
> org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367)
> {code}
> Use app name as the executor pod name is the Spark internal behavior and we 
> should not make application failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError

2023-09-22 Thread Faiz Halde (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768040#comment-17768040
 ] 

Faiz Halde edited comment on SPARK-45255 at 9/22/23 2:31 PM:
-

to get past the error
org/sparkproject/connect/client/com/google/common/cache/CacheLoader
even after adding guava library, you need to copy their shading rules

```

    (assembly / assemblyShadeRules) := Seq(
      ShadeRule.rename("io.grpc.**" -> 
"org.sparkproject.connect.client.io.grpc.@1").inAll,
      ShadeRule.rename("com.google.**" -> 
"org.sparkproject.connect.client.com.google.@1").inAll,
      ShadeRule.rename("io.netty.**" -> 
"org.sparkproject.connect.client.io.netty.@1").inAll,
      ShadeRule.rename("org.checkerframework.**" -> 
"org.sparkproject.connect.client.org.checkerframework.@1").inAll,
      ShadeRule.rename("javax.annotation.**" -> 
"org.sparkproject.connect.client.javax.annotation.@1").inAll,
      ShadeRule.rename("io.perfmark.**" -> 
"org.sparkproject.connect.client.io.perfmark.@1").inAll,
      ShadeRule.rename("org.codehaus.**" -> 
"org.sparkproject.connect.client.org.codehaus.@1").inAll,
      ShadeRule.rename("android.annotation.**" -> 
"org.sparkproject.connect.client.android.annotation.@1").inAll
    ),

```


was (Author: JIRAUSER300204):
to get pas the error
org/sparkproject/connect/client/com/google/common/cache/CacheLoader
even after adding guava library, you need to copy their shading rules

```

    (assembly / assemblyShadeRules) := Seq(
      ShadeRule.rename("io.grpc.**" -> 
"org.sparkproject.connect.client.io.grpc.@1").inAll,
      ShadeRule.rename("com.google.**" -> 
"org.sparkproject.connect.client.com.google.@1").inAll,
      ShadeRule.rename("io.netty.**" -> 
"org.sparkproject.connect.client.io.netty.@1").inAll,
      ShadeRule.rename("org.checkerframework.**" -> 
"org.sparkproject.connect.client.org.checkerframework.@1").inAll,
      ShadeRule.rename("javax.annotation.**" -> 
"org.sparkproject.connect.client.javax.annotation.@1").inAll,
      ShadeRule.rename("io.perfmark.**" -> 
"org.sparkproject.connect.client.io.perfmark.@1").inAll,
      ShadeRule.rename("org.codehaus.**" -> 
"org.sparkproject.connect.client.org.codehaus.@1").inAll,
      ShadeRule.rename("android.annotation.**" -> 
"org.sparkproject.connect.client.android.annotation.@1").inAll
    ),

```

> Spark connect client failing with java.lang.NoClassDefFoundError
> 
>
> Key: SPARK-45255
> URL: https://issues.apache.org/jira/browse/SPARK-45255
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> java 1.8, sbt 1.9, scala 2.12
>  
> I have a very simple repo with the following dependency in `build.sbt`
> ```
> {{libraryDependencies ++= Seq("org.apache.spark" %% 
> "spark-connect-client-jvm" % "3.5.0")}}
> ```
> A simple application
> ```
> {{object Main extends App {}}
> {{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
> {{}}}
> ```
> But when I run it, I get the following error
>  
> ```
> {{Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
> {{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
> {{    at Main$delayedInit$body.apply(Main.scala:3)}}
> {{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
> {{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
> {{    at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
> {{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
> {{    at scala.collection.immutable.List.foreach(List.scala:431)}}
> {{    at scala.App.main(App.scala:80)}}
> {{    at scala.App.main$(App.scala:78)}}
> {{    at Main$.main(Main.scala:3)}}
> {{    at Main.main(Main.scala)}}
> {{Caused by: java.lang.ClassNotFoundException: 
> org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
> {{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
> {{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
> {{    ... 11 more}}
> ```
> I know the connect does a bunch of shading during assembly so it could be 
> related to that. This application is not started via spark-submit or 
> anything. It's not run neither under a `SPARK_HOME` ( I guess that's the 
> whole point of connect client )
>  
> EDIT
> Not sure if it's the right mitigation but explicitly adding guava worked but 
> now I am in the 2nd territory of error
> {{Sep 21, 2023 8:21:59 PM 
>

[jira] [Comment Edited] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError

2023-09-22 Thread Faiz Halde (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768040#comment-17768040
 ] 

Faiz Halde edited comment on SPARK-45255 at 9/22/23 2:31 PM:
-

to get past the error
`org/sparkproject/connect/client/com/google/common/cache/CacheLoader`
even after adding guava library, you need to copy their shading rules

```

    (assembly / assemblyShadeRules) := Seq(
      ShadeRule.rename("io.grpc.**" -> 
"org.sparkproject.connect.client.io.grpc.@1").inAll,
      ShadeRule.rename("com.google.**" -> 
"org.sparkproject.connect.client.com.google.@1").inAll,
      ShadeRule.rename("io.netty.**" -> 
"org.sparkproject.connect.client.io.netty.@1").inAll,
      ShadeRule.rename("org.checkerframework.**" -> 
"org.sparkproject.connect.client.org.checkerframework.@1").inAll,
      ShadeRule.rename("javax.annotation.**" -> 
"org.sparkproject.connect.client.javax.annotation.@1").inAll,
      ShadeRule.rename("io.perfmark.**" -> 
"org.sparkproject.connect.client.io.perfmark.@1").inAll,
      ShadeRule.rename("org.codehaus.**" -> 
"org.sparkproject.connect.client.org.codehaus.@1").inAll,
      ShadeRule.rename("android.annotation.**" -> 
"org.sparkproject.connect.client.android.annotation.@1").inAll
    ),

```


was (Author: JIRAUSER300204):
to get past the error
org/sparkproject/connect/client/com/google/common/cache/CacheLoader
even after adding guava library, you need to copy their shading rules

```

    (assembly / assemblyShadeRules) := Seq(
      ShadeRule.rename("io.grpc.**" -> 
"org.sparkproject.connect.client.io.grpc.@1").inAll,
      ShadeRule.rename("com.google.**" -> 
"org.sparkproject.connect.client.com.google.@1").inAll,
      ShadeRule.rename("io.netty.**" -> 
"org.sparkproject.connect.client.io.netty.@1").inAll,
      ShadeRule.rename("org.checkerframework.**" -> 
"org.sparkproject.connect.client.org.checkerframework.@1").inAll,
      ShadeRule.rename("javax.annotation.**" -> 
"org.sparkproject.connect.client.javax.annotation.@1").inAll,
      ShadeRule.rename("io.perfmark.**" -> 
"org.sparkproject.connect.client.io.perfmark.@1").inAll,
      ShadeRule.rename("org.codehaus.**" -> 
"org.sparkproject.connect.client.org.codehaus.@1").inAll,
      ShadeRule.rename("android.annotation.**" -> 
"org.sparkproject.connect.client.android.annotation.@1").inAll
    ),

```

> Spark connect client failing with java.lang.NoClassDefFoundError
> 
>
> Key: SPARK-45255
> URL: https://issues.apache.org/jira/browse/SPARK-45255
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> java 1.8, sbt 1.9, scala 2.12
>  
> I have a very simple repo with the following dependency in `build.sbt`
> ```
> {{libraryDependencies ++= Seq("org.apache.spark" %% 
> "spark-connect-client-jvm" % "3.5.0")}}
> ```
> A simple application
> ```
> {{object Main extends App {}}
> {{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
> {{}}}
> ```
> But when I run it, I get the following error
>  
> ```
> {{Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
> {{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
> {{    at Main$delayedInit$body.apply(Main.scala:3)}}
> {{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
> {{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
> {{    at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
> {{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
> {{    at scala.collection.immutable.List.foreach(List.scala:431)}}
> {{    at scala.App.main(App.scala:80)}}
> {{    at scala.App.main$(App.scala:78)}}
> {{    at Main$.main(Main.scala:3)}}
> {{    at Main.main(Main.scala)}}
> {{Caused by: java.lang.ClassNotFoundException: 
> org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
> {{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
> {{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
> {{    ... 11 more}}
> ```
> I know the connect does a bunch of shading during assembly so it could be 
> related to that. This application is not started via spark-submit or 
> anything. It's not run neither under a `SPARK_HOME` ( I guess that's the 
> whole point of connect client )
>  
> EDIT
> Not sure if it's the right mitigation but explicitly adding guava worked but 
> now I am in the 2nd territory of error
> {{Sep 21, 2023 8:21:59 PM 
>

[jira] [Commented] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError

2023-09-22 Thread Faiz Halde (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768040#comment-17768040
 ] 

Faiz Halde commented on SPARK-45255:


to get pas the error
org/sparkproject/connect/client/com/google/common/cache/CacheLoader
even after adding guava library, you need to copy their shading rules

```

    (assembly / assemblyShadeRules) := Seq(
      ShadeRule.rename("io.grpc.**" -> 
"org.sparkproject.connect.client.io.grpc.@1").inAll,
      ShadeRule.rename("com.google.**" -> 
"org.sparkproject.connect.client.com.google.@1").inAll,
      ShadeRule.rename("io.netty.**" -> 
"org.sparkproject.connect.client.io.netty.@1").inAll,
      ShadeRule.rename("org.checkerframework.**" -> 
"org.sparkproject.connect.client.org.checkerframework.@1").inAll,
      ShadeRule.rename("javax.annotation.**" -> 
"org.sparkproject.connect.client.javax.annotation.@1").inAll,
      ShadeRule.rename("io.perfmark.**" -> 
"org.sparkproject.connect.client.io.perfmark.@1").inAll,
      ShadeRule.rename("org.codehaus.**" -> 
"org.sparkproject.connect.client.org.codehaus.@1").inAll,
      ShadeRule.rename("android.annotation.**" -> 
"org.sparkproject.connect.client.android.annotation.@1").inAll
    ),

```

> Spark connect client failing with java.lang.NoClassDefFoundError
> 
>
> Key: SPARK-45255
> URL: https://issues.apache.org/jira/browse/SPARK-45255
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> java 1.8, sbt 1.9, scala 2.12
>  
> I have a very simple repo with the following dependency in `build.sbt`
> ```
> {{libraryDependencies ++= Seq("org.apache.spark" %% 
> "spark-connect-client-jvm" % "3.5.0")}}
> ```
> A simple application
> ```
> {{object Main extends App {}}
> {{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
> {{}}}
> ```
> But when I run it, I get the following error
>  
> ```
> {{Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
> {{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
> {{    at Main$delayedInit$body.apply(Main.scala:3)}}
> {{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
> {{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
> {{    at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
> {{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
> {{    at scala.collection.immutable.List.foreach(List.scala:431)}}
> {{    at scala.App.main(App.scala:80)}}
> {{    at scala.App.main$(App.scala:78)}}
> {{    at Main$.main(Main.scala:3)}}
> {{    at Main.main(Main.scala)}}
> {{Caused by: java.lang.ClassNotFoundException: 
> org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
> {{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
> {{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
> {{    ... 11 more}}
> ```
> I know the connect does a bunch of shading during assembly so it could be 
> related to that. This application is not started via spark-submit or 
> anything. It's not run neither under a `SPARK_HOME` ( I guess that's the 
> whole point of connect client )
>  
> EDIT
> Not sure if it's the right mitigation but explicitly adding guava worked but 
> now I am in the 2nd territory of error
> {{Sep 21, 2023 8:21:59 PM 
> org.sparkproject.connect.client.io.grpc.NameResolverRegistry 
> getDefaultRegistry}}
> {{WARNING: No NameResolverProviders found via ServiceLoader, including for 
> DNS. This is probably due to a broken build. If using ProGuard, check your 
> configuration}}
> {{Exception in thread "main" 
> org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException:
>  No functional channel service provider found. Try adding a dependency on the 
> grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}}
> {{    at 
>

[jira] [Commented] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError

2023-09-22 Thread Faiz Halde (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768037#comment-17768037
 ] 

Faiz Halde commented on SPARK-45255:


For now, I unblocked myself by manually building spark connect

{{build/mvn -Pconnect -DskipTests clean package}}

{{and then running}}

{{mkdir connect-jars}}

{{./bin/spark-connect-scala-client-classpath | tr ':' '\n' | xargs -I{} cp {} 
connect-jars}}

 

{{Then, in your client application, have the connect-jars directory in your 
classpath. Not sure if this is the right way though}}

> Spark connect client failing with java.lang.NoClassDefFoundError
> 
>
> Key: SPARK-45255
> URL: https://issues.apache.org/jira/browse/SPARK-45255
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> java 1.8, sbt 1.9, scala 2.12
>  
> I have a very simple repo with the following dependency in `build.sbt`
> ```
> {{libraryDependencies ++= Seq("org.apache.spark" %% 
> "spark-connect-client-jvm" % "3.5.0")}}
> ```
> A simple application
> ```
> {{object Main extends App {}}
> {{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
> {{}}}
> ```
> But when I run it, I get the following error
>  
> ```
> {{Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
> {{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
> {{    at Main$delayedInit$body.apply(Main.scala:3)}}
> {{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
> {{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
> {{    at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
> {{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
> {{    at scala.collection.immutable.List.foreach(List.scala:431)}}
> {{    at scala.App.main(App.scala:80)}}
> {{    at scala.App.main$(App.scala:78)}}
> {{    at Main$.main(Main.scala:3)}}
> {{    at Main.main(Main.scala)}}
> {{Caused by: java.lang.ClassNotFoundException: 
> org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
> {{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
> {{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
> {{    ... 11 more}}
> ```
> I know the connect does a bunch of shading during assembly so it could be 
> related to that. This application is not started via spark-submit or 
> anything. It's not run neither under a `SPARK_HOME` ( I guess that's the 
> whole point of connect client )
>  
> EDIT
> Not sure if it's the right mitigation but explicitly adding guava worked but 
> now I am in the 2nd territory of error
> {{Sep 21, 2023 8:21:59 PM 
> org.sparkproject.connect.client.io.grpc.NameResolverRegistry 
> getDefaultRegistry}}
> {{WARNING: No NameResolverProviders found via ServiceLoader, including for 
> DNS. This is probably due to a broken build. If using ProGuard, check your 
> configuration}}
> {{Exception in thread "main" 
> org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException:
>  No functional channel service provider found. Try adding a dependency on the 
> grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}}
> {{    at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:945)}}
> {{    at scala.Option.getOrElse(Option.scala:189)}}
> {{    at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:945)}}
> {{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
> {{    at Main$delayedInit$body.apply(Main.scala:3)}}
> {{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
> {{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
> {{    at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
> {{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
> {{    at scala.collection.immutable.List.foreach(List.scala:431)}}
> {{    at scala.App.main(App.scala:80)}}
> {{

[jira] [Comment Edited] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError

2023-09-22 Thread Faiz Halde (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768037#comment-17768037
 ] 

Faiz Halde edited comment on SPARK-45255 at 9/22/23 2:29 PM:
-

For now, I unblocked myself by manually building spark connect

{{build/mvn -Pconnect -DskipTests clean package}}

{{and then running}}

{{mkdir connect-jars}}

{{./bin/spark-connect-scala-client-classpath | tr ':' '\n' | xargs -I{} cp {} 
connect-jars}}

 

{{Then, when starting your client application, have the connect-jars directory 
in your classpath. Not sure if this is the right way though}}


was (Author: JIRAUSER300204):
For now, I unblocked myself by manually building spark connect

{{build/mvn -Pconnect -DskipTests clean package}}

{{and then running}}

{{mkdir connect-jars}}

{{./bin/spark-connect-scala-client-classpath | tr ':' '\n' | xargs -I{} cp {} 
connect-jars}}

 

{{Then, in your client application, have the connect-jars directory in your 
classpath. Not sure if this is the right way though}}

> Spark connect client failing with java.lang.NoClassDefFoundError
> 
>
> Key: SPARK-45255
> URL: https://issues.apache.org/jira/browse/SPARK-45255
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> java 1.8, sbt 1.9, scala 2.12
>  
> I have a very simple repo with the following dependency in `build.sbt`
> ```
> {{libraryDependencies ++= Seq("org.apache.spark" %% 
> "spark-connect-client-jvm" % "3.5.0")}}
> ```
> A simple application
> ```
> {{object Main extends App {}}
> {{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
> {{}}}
> ```
> But when I run it, I get the following error
>  
> ```
> {{Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
> {{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
> {{    at Main$delayedInit$body.apply(Main.scala:3)}}
> {{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
> {{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
> {{    at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
> {{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
> {{    at scala.collection.immutable.List.foreach(List.scala:431)}}
> {{    at scala.App.main(App.scala:80)}}
> {{    at scala.App.main$(App.scala:78)}}
> {{    at Main$.main(Main.scala:3)}}
> {{    at Main.main(Main.scala)}}
> {{Caused by: java.lang.ClassNotFoundException: 
> org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
> {{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
> {{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
> {{    ... 11 more}}
> ```
> I know the connect does a bunch of shading during assembly so it could be 
> related to that. This application is not started via spark-submit or 
> anything. It's not run neither under a `SPARK_HOME` ( I guess that's the 
> whole point of connect client )
>  
> EDIT
> Not sure if it's the right mitigation but explicitly adding guava worked but 
> now I am in the 2nd territory of error
> {{Sep 21, 2023 8:21:59 PM 
> org.sparkproject.connect.client.io.grpc.NameResolverRegistry 
> getDefaultRegistry}}
> {{WARNING: No NameResolverProviders found via ServiceLoader, including for 
> DNS. This is probably due to a broken build. If using ProGuard, check your 
> configuration}}
> {{Exception in thread "main" 
> org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException:
>  No functional channel service provider found. Try adding a dependency on the 
> grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}}
> {{    at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:945)}}
> {{    at scala.Option.getOrElse(Option.scala:189)}}
> {{    at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:945)}}
> {{    at

[jira] [Updated] (SPARK-45280) Change Maven daily test use Java 17 for testing.

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45280:
---
Labels: pull-request-available  (was: )

> Change Maven daily test use Java 17 for testing.
> 
>
> Key: SPARK-45280
> URL: https://issues.apache.org/jira/browse/SPARK-45280
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45280) Change Maven daily test use Java 17 for testing.

2023-09-22 Thread Yang Jie (Jira)

Yang Jie created SPARK-45280:


 Summary: Change Maven daily test use Java 17 for testing.
 Key: SPARK-45280
 URL: https://issues.apache.org/jira/browse/SPARK-45280
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45255) Spark connect client failing with java.lang.NoClassDefFoundError

2023-09-22 Thread Aleksandr Aleksandrov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768005#comment-17768005
 ] 

Aleksandr Aleksandrov commented on SPARK-45255:
---

I have the same issue. But adding guava dependency didn't help me


 
{code:java}
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/sparkproject/connect/client/com/google/common/cache/CacheLoader 
at ... 
Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.connect.client.com.google.common.cache.CacheLoader 
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
 
at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
 
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) 
... 2 more{code}
 

> Spark connect client failing with java.lang.NoClassDefFoundError
> 
>
> Key: SPARK-45255
> URL: https://issues.apache.org/jira/browse/SPARK-45255
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> java 1.8, sbt 1.9, scala 2.12
>  
> I have a very simple repo with the following dependency in `build.sbt`
> ```
> {{libraryDependencies ++= Seq("org.apache.spark" %% 
> "spark-connect-client-jvm" % "3.5.0")}}
> ```
> A simple application
> ```
> {{object Main extends App {}}
> {{   val s = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{   s.read.json("/tmp/input.json").repartition(10).show(false)}}
> {{}}}
> ```
> But when I run it, I get the following error
>  
> ```
> {{Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/connect/client/com/google/common/cache/CacheLoader}}
> {{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
> {{    at Main$delayedInit$body.apply(Main.scala:3)}}
> {{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
> {{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
> {{    at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)}}
> {{    at scala.App.$anonfun$main$1$adapted(App.scala:80)}}
> {{    at scala.collection.immutable.List.foreach(List.scala:431)}}
> {{    at scala.App.main(App.scala:80)}}
> {{    at scala.App.main$(App.scala:78)}}
> {{    at Main$.main(Main.scala:3)}}
> {{    at Main.main(Main.scala)}}
> {{Caused by: java.lang.ClassNotFoundException: 
> org.sparkproject.connect.client.com.google.common.cache.CacheLoader}}
> {{    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)}}
> {{    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)}}
> {{    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)}}
> {{    ... 11 more}}
> ```
> I know the connect does a bunch of shading during assembly so it could be 
> related to that. This application is not started via spark-submit or 
> anything. It's not run neither under a `SPARK_HOME` ( I guess that's the 
> whole point of connect client )
>  
> EDIT
> Not sure if it's the right mitigation but explicitly adding guava worked but 
> now I am in the 2nd territory of error
> {{Sep 21, 2023 8:21:59 PM 
> org.sparkproject.connect.client.io.grpc.NameResolverRegistry 
> getDefaultRegistry}}
> {{WARNING: No NameResolverProviders found via ServiceLoader, including for 
> DNS. This is probably due to a broken build. If using ProGuard, check your 
> configuration}}
> {{Exception in thread "main" 
> org.sparkproject.connect.client.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.sparkproject.connect.client.io.grpc.ManagedChannelRegistry$ProviderNotFoundException:
>  No functional channel service provider found. Try adding a dependency on the 
> grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache.get(LocalCache.java:4011)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)}}
> {{    at 
> org.sparkproject.connect.client.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)}}
> {{    at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:945)}}
> {{    at scala.Option.getOrElse(Option.scala:189)}}
> {{    at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:945)}}
> {{    at Main$.delayedEndpoint$Main$1(Main.scala:4)}}
> {{    at Main$delayedInit$body.apply(Main.scala:3)}}
> {{    at scala.Function0.apply$mcV$sp(Function0.scala:39)}}
> {{    at scala.Function0.apply$mcV$sp$(Function0.scala:39)}}
> {{    at 
>

[jira] [Updated] (SPARK-45277) Install Java 17 for Windows SparkR test

2023-09-22 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45277:
-
Summary: Install Java 17 for Windows SparkR test  (was: Install a Java 17 
for windows SparkR test)

> Install Java 17 for Windows SparkR test
> ---
>
> Key: SPARK-45277
> URL: https://issues.apache.org/jira/browse/SPARK-45277
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45277) Install Java 17 for Windows SparkR test

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45277:
---
Labels: pull-request-available  (was: )

> Install Java 17 for Windows SparkR test
> ---
>
> Key: SPARK-45277
> URL: https://issues.apache.org/jira/browse/SPARK-45277
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45247) Upgrade Pandas to 2.1.1

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45247:
-

Assignee: Haejoon Lee

> Upgrade Pandas to 2.1.1
> ---
>
> Key: SPARK-45247
> URL: https://issues.apache.org/jira/browse/SPARK-45247
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> https://pandas.pydata.org/pandas-docs/dev/whatsnew/v2.1.1.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45247) Upgrade Pandas to 2.1.1

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45247.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43025
[https://github.com/apache/spark/pull/43025]

> Upgrade Pandas to 2.1.1
> ---
>
> Key: SPARK-45247
> URL: https://issues.apache.org/jira/browse/SPARK-45247
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://pandas.pydata.org/pandas-docs/dev/whatsnew/v2.1.1.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44112) Drop Java 8 and 11 support

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44112:
-

Assignee: Yang Jie

> Drop Java 8 and 11 support
> --
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44112) Drop Java 8 and 11 support

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44112.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43005
[https://github.com/apache/spark/pull/43005]

> Drop Java 8 and 11 support
> --
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43655) Enable NamespaceParityTests.test_get_index_map

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-43655.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43052
[https://github.com/apache/spark/pull/43052]

> Enable NamespaceParityTests.test_get_index_map
> --
>
> Key: SPARK-43655
> URL: https://issues.apache.org/jira/browse/SPARK-43655
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable NamespaceParityTests.test_get_index_map



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43655) Enable NamespaceParityTests.test_get_index_map

2023-09-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-43655:
-

Assignee: Haejoon Lee

> Enable NamespaceParityTests.test_get_index_map
> --
>
> Key: SPARK-43655
> URL: https://issues.apache.org/jira/browse/SPARK-43655
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Enable NamespaceParityTests.test_get_index_map



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45278) Make Yarn executor's bindAddress configurable

2023-09-22 Thread Hendra Saputra (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hendra Saputra updated SPARK-45278:
---
Fix Version/s: (was: 4.0.0)
   (was: 3.5.1)

> Make Yarn executor's bindAddress configurable
> -
>
> Key: SPARK-45278
> URL: https://issues.apache.org/jira/browse/SPARK-45278
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Hendra Saputra
>Assignee: Nishchal Venkataramana
>Priority: Major
>  Labels: pull-request-available
>
> An improvement has been made in SPARK-24203 that executor's bind address is 
> configurable. Unfortunately this configuration hasn't implemented in Yarn.
> When Yarn cluster is deployed in in Kubernetes, it is preferable to bind the 
> executor to loopback interface or all interface. This Jira is to allow Yarn 
> to bind the executor to either pod IP or loopbak interface or all interface 
> to allow mesh integration like Istio with the cluster.
> Another linked Jira explained *Allowing binding to all IPs* very well 
> SPARK-42411 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45278) Make Yarn executor's bindAddress configurable

2023-09-22 Thread Hendra Saputra (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hendra Saputra updated SPARK-45278:
---
Fix Version/s: 4.0.0
   3.5.1
   (was: 3.0.0)
Affects Version/s: 3.5.0
   (was: 4.0.0)
   (was: 3.5.1)

> Make Yarn executor's bindAddress configurable
> -
>
> Key: SPARK-45278
> URL: https://issues.apache.org/jira/browse/SPARK-45278
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Hendra Saputra
>Assignee: Nishchal Venkataramana
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>
> An improvement has been made in SPARK-24203 that executor's bind address is 
> configurable. Unfortunately this configuration hasn't implemented in Yarn.
> When Yarn cluster is deployed in in Kubernetes, it is preferable to bind the 
> executor to loopback interface or all interface. This Jira is to allow Yarn 
> to bind the executor to either pod IP or loopbak interface or all interface 
> to allow mesh integration like Istio with the cluster.
> Another linked Jira explained *Allowing binding to all IPs* very well 
> SPARK-42411 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45279) Attach plan_id for all logical plan

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45279:
---
Labels: pull-request-available  (was: )

> Attach plan_id for all logical plan
> ---
>
> Key: SPARK-45279
> URL: https://issues.apache.org/jira/browse/SPARK-45279
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45278) Make Yarn executor's bindAddress configurable

2023-09-22 Thread Hendra Saputra (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767932#comment-17767932
 ] 

Hendra Saputra commented on SPARK-45278:


PR is up for review [https://github.com/apache/spark/pull/42870.] Thanks

> Make Yarn executor's bindAddress configurable
> -
>
> Key: SPARK-45278
> URL: https://issues.apache.org/jira/browse/SPARK-45278
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Hendra Saputra
>Assignee: Nishchal Venkataramana
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>
> An improvement has been made in SPARK-24203 that executor's bind address is 
> configurable. Unfortunately this configuration hasn't implemented in Yarn.
> When Yarn cluster is deployed in in Kubernetes, it is preferable to bind the 
> executor to loopback interface or all interface. This Jira is to allow Yarn 
> to bind the executor to either pod IP or loopbak interface or all interface 
> to allow mesh integration like Istio with the cluster.
> Another linked Jira explained *Allowing binding to all IPs* very well 
> SPARK-42411 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45279) Attach plan_id for all logical plan

2023-09-22 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-45279:
-

 Summary: Attach plan_id for all logical plan
 Key: SPARK-45279
 URL: https://issues.apache.org/jira/browse/SPARK-45279
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45278) Make Yarn executor's bindAddress configurable

2023-09-22 Thread Hendra Saputra (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hendra Saputra updated SPARK-45278:
---
Description: 
An improvement has been made in SPARK-24203 that executor's bind address is 
configurable. Unfortunately this configuration hasn't implemented in Yarn.

When Yarn cluster is deployed in in Kubernetes, it is preferable to bind the 
executor to loopback interface or all interface. This Jira is to allow Yarn to 
bind the executor to either pod IP or loopbak interface or all interface to 
allow mesh integration like Istio with the cluster.

Another linked Jira explained *Allowing binding to all IPs* very well 
SPARK-42411 

  was:
An improvement has been made in SPARK-24203 that executor's bind address is 
configurable. Unfortunately this configuration hasn't implemented in Yarn.

When Yarn cluster is deployed in in Kubernetes, it is preferable to bind the 
executor to loopback interface or all interface. This Jira is to allow Yarn to 
bind the executor to either pod IP or loopbak interface or all interface to 
allow mesh integration like Istio with the cluster.

Another linked Jira explained this very well SPARK-42411 


> Make Yarn executor's bindAddress configurable
> -
>
> Key: SPARK-45278
> URL: https://issues.apache.org/jira/browse/SPARK-45278
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Hendra Saputra
>Assignee: Nishchal Venkataramana
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>
> An improvement has been made in SPARK-24203 that executor's bind address is 
> configurable. Unfortunately this configuration hasn't implemented in Yarn.
> When Yarn cluster is deployed in in Kubernetes, it is preferable to bind the 
> executor to loopback interface or all interface. This Jira is to allow Yarn 
> to bind the executor to either pod IP or loopbak interface or all interface 
> to allow mesh integration like Istio with the cluster.
> Another linked Jira explained *Allowing binding to all IPs* very well 
> SPARK-42411 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45278) Make Yarn executor's bindAddress configurable

2023-09-22 Thread Hendra Saputra (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hendra Saputra updated SPARK-45278:
---
Description: 
An improvement has been made in SPARK-24203 that executor's bind address is 
configurable. Unfortunately this configuration hasn't implemented in Yarn.

When Yarn cluster is deployed in in Kubernetes, it is preferable to bind the 
executor to loopback interface or all interface. This Jira is to allow Yarn to 
bind the executor to either pod IP or loopbak interface or all interface to 
allow mesh integration like Istio with the cluster.

Another linked Jira explained this very well SPARK-42411 

  was:An improvement has been made in SPARK-24203 that executor's bind address 
is configurable. Unfortunately this configuration hasn't implemented in Yarn. 
When Yarn cluster is deployed in in Kubernetes, it is preferable to bind the 
executor to loopback interface or all interface. This Jira is to allow Yarn to 
bind the executor to either pod IP or loopbak interface or all interface to 
allow mesh integration like Istio with the cluster 


> Make Yarn executor's bindAddress configurable
> -
>
> Key: SPARK-45278
> URL: https://issues.apache.org/jira/browse/SPARK-45278
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Hendra Saputra
>Assignee: Nishchal Venkataramana
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>
> An improvement has been made in SPARK-24203 that executor's bind address is 
> configurable. Unfortunately this configuration hasn't implemented in Yarn.
> When Yarn cluster is deployed in in Kubernetes, it is preferable to bind the 
> executor to loopback interface or all interface. This Jira is to allow Yarn 
> to bind the executor to either pod IP or loopbak interface or all interface 
> to allow mesh integration like Istio with the cluster.
> Another linked Jira explained this very well SPARK-42411 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45278) Make Yarn executor's bindAddress configurable

2023-09-22 Thread Hendra Saputra (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hendra Saputra updated SPARK-45278:
---
Description: An improvement has been made in SPARK-24203 that executor's 
bind address is configurable. Unfortunately this configuration hasn't 
implemented in Yarn. When Yarn cluster is deployed in in Kubernetes, it is 
preferable to bind the executor to loopback interface or all interface. This 
Jira is to allow Yarn to bind the executor to either pod IP or loopbak 
interface or all interface to allow mesh integration like Istio with the 
cluster   (was: Previous improvement is made that now Executor bind address is 
configurable in  )

> Make Yarn executor's bindAddress configurable
> -
>
> Key: SPARK-45278
> URL: https://issues.apache.org/jira/browse/SPARK-45278
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Hendra Saputra
>Assignee: Nishchal Venkataramana
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>
> An improvement has been made in SPARK-24203 that executor's bind address is 
> configurable. Unfortunately this configuration hasn't implemented in Yarn. 
> When Yarn cluster is deployed in in Kubernetes, it is preferable to bind the 
> executor to loopback interface or all interface. This Jira is to allow Yarn 
> to bind the executor to either pod IP or loopbak interface or all interface 
> to allow mesh integration like Istio with the cluster 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45278) Make Yarn executor's bindAddress configurable

2023-09-22 Thread Hendra Saputra (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hendra Saputra updated SPARK-45278:
---
Affects Version/s: 4.0.0
   3.5.1
   (was: 2.1.1)

> Make Yarn executor's bindAddress configurable
> -
>
> Key: SPARK-45278
> URL: https://issues.apache.org/jira/browse/SPARK-45278
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Hendra Saputra
>Assignee: Nishchal Venkataramana
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45278) Make Yarn executor's bindAddress configurable

2023-09-22 Thread Hendra Saputra (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hendra Saputra updated SPARK-45278:
---
Description: Previous improvement is made that now Executor bind address is 
configurable in  

> Make Yarn executor's bindAddress configurable
> -
>
> Key: SPARK-45278
> URL: https://issues.apache.org/jira/browse/SPARK-45278
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Hendra Saputra
>Assignee: Nishchal Venkataramana
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>
> Previous improvement is made that now Executor bind address is configurable 
> in  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45278) Make Yarn executor's bindAddress configurable

2023-09-22 Thread Hendra Saputra (Jira)

Hendra Saputra created SPARK-45278:
--

 Summary: Make Yarn executor's bindAddress configurable
 Key: SPARK-45278
 URL: https://issues.apache.org/jira/browse/SPARK-45278
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.1.1
Reporter: Hendra Saputra
Assignee: Nishchal Venkataramana
 Fix For: 3.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44112) Drop Java 8 and 11 support

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-44112:
--

Assignee: (was: Apache Spark)

> Drop Java 8 and 11 support
> --
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44112) Drop Java 8 and 11 support

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-44112:
--

Assignee: Apache Spark

> Drop Java 8 and 11 support
> --
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44112) Drop Java 8 and 11 support

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-44112:
--

Assignee: (was: Apache Spark)

> Drop Java 8 and 11 support
> --
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44112) Drop Java 8 and 11 support

2023-09-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-44112:
--

Assignee: Apache Spark

> Drop Java 8 and 11 support
> --
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45277) Install a Java 17 for windows SparkR test

2023-09-22 Thread Yang Jie (Jira)

Yang Jie created SPARK-45277:


 Summary: Install a Java 17 for windows SparkR test
 Key: SPARK-45277
 URL: https://issues.apache.org/jira/browse/SPARK-45277
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45276) Replace Java 8 and Java 11 installed in the Dockerfile with Java

2023-09-22 Thread Yang Jie (Jira)

Yang Jie created SPARK-45276:


 Summary: Replace Java 8 and Java 11 installed in the Dockerfile 
with Java
 Key: SPARK-45276
 URL: https://issues.apache.org/jira/browse/SPARK-45276
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Yang Jie


including dev/create-release/spark-rm/Dockerfile and 
connector/docker/spark-test/base/Dockerfile

There might be others as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45175) download krb5.conf from remote storage in spark-submit on k8s

2023-09-22 Thread Qian Sun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767908#comment-17767908
 ] 

Qian Sun commented on SPARK-45175:
--

In multi-tenant scenarios, I find Apache Spark provide 
*{{spark.kubernetes.kerberos.krb5.configMapName}}* to mount ConfigMap 
containing the {{*krb5.conf*}} file, we could manage these files by creating 
multiple configMaps for multi-tenants.

> download krb5.conf from remote storage in spark-submit on k8s
> -
>
> Key: SPARK-45175
> URL: https://issues.apache.org/jira/browse/SPARK-45175
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.1
>Reporter: Qian Sun
>Priority: Minor
>  Labels: pull-request-available
>
> krb5.conf currently only supports the local file format. Tenants would like 
> to save this file on their own servers and download it during the 
> spark-submit phase for better implementation of multi-tenant scenarios. The 
> proposed solution is to use the *downloadFile*  function[1], similar to the 
> configuration of *spark.kubernetes.driver/executor.podTemplateFile*
>  
> [1]https://github.com/apache/spark/blob/822f58f0d26b7d760469151a65eaf9ee863a07a1/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStep.scala#L82C24-L82C24



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45242) Use DataFrame ID to semantically validate CollectMetrics

2023-09-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang resolved SPARK-45242.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

https://github.com/apache/spark/pull/43010

> Use DataFrame ID to semantically validate CollectMetrics 
> -
>
> Key: SPARK-45242
> URL: https://issues.apache.org/jira/browse/SPARK-45242
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45275) replace function fails to handle null replace param

2023-09-22 Thread Diogo Marques (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diogo Marques updated SPARK-45275:
--
Attachment: replace_bug.png

> replace function fails to handle null replace param
> ---
>
> Key: SPARK-45275
> URL: https://issues.apache.org/jira/browse/SPARK-45275
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Diogo Marques
>Priority: Major
> Attachments: replace_bug.png
>
>
> [replace |https://spark.apache.org/docs/latest/api/sql/#replace]function 
> fails to handle null replace param, example below:
>  
> df.withColumn('test',F.expr('replace(col1, "nUll", 1)')).show()
> || ||col1||col2||test||
> ||0|person1|0.0|person1|
> ||1|person1|2.0|person1|
> ||2|person1|3.0|person1|
> ||3|person2|1.0|person2|
> ||4|None|2.0|None|
> ||5|nUll|None|1|
>  
> df.withColumn('test',F.expr('replace(col1, "nUll", null)')).show()
> || ||col1||col2||test||
> ||0|person1|0.0|None|
> ||1|person1|2.0|None|
> ||2|person1|3.0|None|
> ||3|person2|1.0|None|
> ||4|None|2.0|None|
> ||5|nUll|None|None|
>  
>  
> This function has been ported over to 3.5.0 but I've not been able to test it 
> on that yet



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45275) replace function fails to handle null replace param

2023-09-22 Thread Diogo Marques (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diogo Marques updated SPARK-45275:
--
Description: 
[replace |https://spark.apache.org/docs/latest/api/sql/#replace]function fails 
to handle null replace param, example below:

 
df.withColumn('test',F.expr('replace(col1, "nUll", 1)')).show()
|| ||col1||col2||test||
||0|person1|0.0|person1|
||1|person1|2.0|person1|
||2|person1|3.0|person1|
||3|person2|1.0|person2|
||4|None|2.0|None|
||5|nUll|None|1|

 
df.withColumn('test',F.expr('replace(col1, "nUll", null)')).show()
|| ||col1||col2||test||
||0|person1|0.0|None|
||1|person1|2.0|None|
||2|person1|3.0|None|
||3|person2|1.0|None|
||4|None|2.0|None|
||5|nUll|None|None|

 

 

This function has been ported over to 3.5.0 but I've not been able to test it 
on that yet

  was:
[replace |https://spark.apache.org/docs/latest/api/sql/#replace]function fails 
to handle null replace param, example below:

 
df.withColumn('test',F.expr('replace(col1, "nUll", 1)')).show()
|| ||col1||col2||test||
||0|person1|0.0|person1|
||1|person1|2.0|person1|
||2|person1|3.0|person1|
||3|person2|1.0|person2|
||4|None|2.0|None|
||5|nUll|None|1|
 
df.withColumn('test',F.expr('replace(col1, "nUll", null)')).show()
|| ||col1||col2||test||
||0|person1|0.0|None|
||1|person1|2.0|None|
||2|person1|3.0|None|
||3|person2|1.0|None|
||4|None|2.0|None|
||5|nUll|None|None|


> replace function fails to handle null replace param
> ---
>
> Key: SPARK-45275
> URL: https://issues.apache.org/jira/browse/SPARK-45275
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Diogo Marques
>Priority: Major
>
> [replace |https://spark.apache.org/docs/latest/api/sql/#replace]function 
> fails to handle null replace param, example below:
>  
> df.withColumn('test',F.expr('replace(col1, "nUll", 1)')).show()
> || ||col1||col2||test||
> ||0|person1|0.0|person1|
> ||1|person1|2.0|person1|
> ||2|person1|3.0|person1|
> ||3|person2|1.0|person2|
> ||4|None|2.0|None|
> ||5|nUll|None|1|
>  
> df.withColumn('test',F.expr('replace(col1, "nUll", null)')).show()
> || ||col1||col2||test||
> ||0|person1|0.0|None|
> ||1|person1|2.0|None|
> ||2|person1|3.0|None|
> ||3|person2|1.0|None|
> ||4|None|2.0|None|
> ||5|nUll|None|None|
>  
>  
> This function has been ported over to 3.5.0 but I've not been able to test it 
> on that yet



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45275) replace function fails to handle null replace param

2023-09-22 Thread Diogo Marques (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diogo Marques updated SPARK-45275:
--
Priority: Trivial  (was: Major)

> replace function fails to handle null replace param
> ---
>
> Key: SPARK-45275
> URL: https://issues.apache.org/jira/browse/SPARK-45275
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Diogo Marques
>Priority: Trivial
>
> [replace |https://spark.apache.org/docs/latest/api/sql/#replace]function 
> fails to handle null replace param, example below:
>  
> df.withColumn('test',F.expr('replace(col1, "nUll", 1)')).show()
> || ||col1||col2||test||
> ||0|person1|0.0|person1|
> ||1|person1|2.0|person1|
> ||2|person1|3.0|person1|
> ||3|person2|1.0|person2|
> ||4|None|2.0|None|
> ||5|nUll|None|1|
>  
> df.withColumn('test',F.expr('replace(col1, "nUll", null)')).show()
> || ||col1||col2||test||
> ||0|person1|0.0|None|
> ||1|person1|2.0|None|
> ||2|person1|3.0|None|
> ||3|person2|1.0|None|
> ||4|None|2.0|None|
> ||5|nUll|None|None|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 124 matches

Mail list logo