date:20231102

[jira] [Created] (SPARK-45768) Make faulthanlder a runtime configuration for Python execution in SQL

2023-11-02 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45768:


 Summary: Make faulthanlder a runtime configuration for Python 
execution in SQL
 Key: SPARK-45768
 URL: https://issues.apache.org/jira/browse/SPARK-45768
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


faulthanlder feature within PySpark is really useful especially to debug an 
errors that regular Python interpreter cannot catch out of the box such as a 
segmentation fault errors, see also https://github.com/apache/spark/pull/43600. 
It would be very useful to convert this as a runtime configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45768) Make faulthandler a runtime configuration for Python execution in SQL

2023-11-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-45768:
-
Summary: Make faulthandler a runtime configuration for Python execution in 
SQL  (was: Make faulthanlder a runtime configuration for Python execution in 
SQL)

> Make faulthandler a runtime configuration for Python execution in SQL
> -
>
> Key: SPARK-45768
> URL: https://issues.apache.org/jira/browse/SPARK-45768
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> faulthanlder feature within PySpark is really useful especially to debug an 
> errors that regular Python interpreter cannot catch out of the box such as a 
> segmentation fault errors, see also 
> https://github.com/apache/spark/pull/43600. It would be very useful to 
> convert this as a runtime configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45768) Make faulthandler a runtime configuration for Python execution in SQL

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45768:
---
Labels: pull-request-available  (was: )

> Make faulthandler a runtime configuration for Python execution in SQL
> -
>
> Key: SPARK-45768
> URL: https://issues.apache.org/jira/browse/SPARK-45768
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> faulthanlder feature within PySpark is really useful especially to debug an 
> errors that regular Python interpreter cannot catch out of the box such as a 
> segmentation fault errors, see also 
> https://github.com/apache/spark/pull/43600. It would be very useful to 
> convert this as a runtime configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45769) data retrieval fails on executors with spark connect

2023-11-02 Thread Steven Ottens (Jira)

Steven Ottens created SPARK-45769:
-

 Summary: data retrieval fails on executors with spark connect
 Key: SPARK-45769
 URL: https://issues.apache.org/jira/browse/SPARK-45769
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Steven Ottens


We have an OpenShift cluster with Spark and JupyterHub and we use Spark-Connect 
to access Spark from within Jupyter. This worked fine with Spark 3.4.1. However 
after upgrading to Spark 3.5.0 we were not able to access any data in our Delta 
Tables through Spark. Initially I assumed it was a bug in Delta: 
[https://github.com/delta-io/delta/issues/2235]

The actual error is
{code:java}
SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due to 
stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost 
task 0.3 in stage 6.0 (TID 13) (172.31.15.72 executor 4): 
java.lang.ClassCastException: cannot assign instance of 
java.lang.invoke.SerializedLambda to field 
org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of 
org.apache.spark.rdd.MapPartitionsRDD{code}
However after further investigation I discovered that this is a regression in 
Spark 3.5.0. The issue is similar to SPARK-36917, however I am not using any 
custom functions, nor any other classes than spark-connect, and this setup used 
to work in 3.4.1. The issue only occurs when remote executors are used in a 
kubernetes environment. Running a plain Spark-Connect eg
{code:java}
./sbin/start-connect-server.sh --packages 
org.apache.spark:spark-connect_2.12:3.5.0{code}
doesn't produce the error.

The issue occurs both in a full OpenShift cluster as in a tiny minikube setup. 
The steps to reproduce are based on the minikube setup.

You need to have a minimal Spark 3.5.0 setup with 1 driver and at least 1 
executor and use python to access data through Spark. The query I used to test 
this is
{code:java}
from pyspark.sql import SparkSession
logFile = '/opt/spark/work-dir/data.csv'
spark = SparkSession.builder.remote('sc://spark-connect').getOrCreate()
df = spark.read.csv(logFile)
df.count()


{code}
However it doesn't matter if the data is local, or remote on a S3 storage, nor 
if the data is plain text, CSV or Delta Table.
h3. Steps to reproduce:
 # Install minikube
 # Create a service account 'spark'
{code:java}
kubectl create sa spark{code}

 # Bind the 'edit' role to the service account

{code:java}
kubectl create rolebinding spark-edit \
 --clusterrole=edit \
 --serviceaccount=default:spark \
 --namespace=default{code}
 # Create a service for spark

{code:java}
kubectl create -f service.yml{code}
 # Create a Spark-Connect deployment with the default Spark docker image: 
[https://hub.docker.com/_/spark] (do change the deployment.yml to point to the 
kubernetes API endpoint

{code:java}
kubectl create -f deployment.yml{code}
 # Add data to both the executor and the driver pods, e.g. login on the 
terminal of the pods and run on both pods

{code:java}
touch data.csv
echo id,name > data.csv
echo 1,2 >> data.csv {code}
 # Start a spark-remote session to access the newly created data. I logged in 
on the driver pod and installed the necessary python packages:

{code:java}
python3 -m pip install pandas pyspark grpcio-tools grpcio-status pyarrow{code}
Started a python shell and executed:
{code:java}
from pyspark.sql import SparkSession
logFile = '/opt/spark/work-dir/data.csv'
spark = SparkSession.builder.remote('sc://spark-connect').getOrCreate()
df = spark.read.csv(logFile)
df.count() {code}
h3. Necessary files:

Service.yml:
{code:java}
apiVersion: v1
kind: Service
metadata:
  labels:
app: spark-connect
  name: spark-connect
  namespace: default
spec:
  ipFamilies:
- IPv4
  ports:
- name: connect-grpc
  protocol: TCP
  port: 15002 # Port the service listens on.
  targetPort: 15002 # Port on the backing pods to which the service 
forwards connections
- name: sparkui
  protocol: TCP
  port: 4040 # Port the service listens on.
  targetPort: 4040 # Port on the backing pods to which the service forwards 
connections
- name: spark-rpc
  protocol: TCP
  port: 7078 # Port the service listens on.
  targetPort: 7078 # Port on the backing pods to which the service forwards 
connections
- name: blockmanager
  protocol: TCP
  port: 7079 # Port the service listens on.
  targetPort: 7079 # Port on the backing pods to which the service forwards 
connections
  internalTrafficPolicy: Cluster
  type: ClusterIP
  ipFamilyPolicy: SingleStack
  sessionAffinity: None
  selector:
app: spark-connect {code}
deployment.yml: (do replace the spark.master URL with the correct one for your 
setup)
{code:java}
kind: Deployment
apiVersion: apps/v1
metadata:
   name: spark-connect
   namespace: default
   uid: 3a1b448e-4594-47a9-95f6-a82ea4ac934

[jira] [Updated] (SPARK-41454) Support Python 3.11

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-41454:
---
Labels: pull-request-available  (was: )

> Support Python 3.11
> ---
>
> Key: SPARK-41454
> URL: https://issues.apache.org/jira/browse/SPARK-41454
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45770) Fix column resolution in DataFrame.drop

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45770:
---
Labels: pull-request-available  (was: )

> Fix column resolution in DataFrame.drop
> ---
>
> Key: SPARK-45770
> URL: https://issues.apache.org/jira/browse/SPARK-45770
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45694:
---
Labels: pull-request-available  (was: )

> Fix `method signum in trait ScalaNumberProxy is deprecated`
> ---
>
> Key: SPARK-45694
> URL: https://issues.apache.org/jira/browse/SPARK-45694
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25:
>  method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use 
> `sign` method instead
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc,
>  origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0
> [warn]       val uc = useCount.signum
> [warn]   {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45767) Delete `TimeStampedHashMap` and its UT

2023-11-02 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-45767:


Assignee: BingKun Pan

> Delete `TimeStampedHashMap` and its UT
> --
>
> Key: SPARK-45767
> URL: https://issues.apache.org/jira/browse/SPARK-45767
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45767) Delete `TimeStampedHashMap` and its UT

2023-11-02 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45767.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43633
[https://github.com/apache/spark/pull/43633]

> Delete `TimeStampedHashMap` and its UT
> --
>
> Key: SPARK-45767
> URL: https://issues.apache.org/jira/browse/SPARK-45767
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45771) Enable spark.eventLog.rolling.enabled by default

2023-11-02 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45771:
-

 Summary: Enable spark.eventLog.rolling.enabled by default
 Key: SPARK-45771
 URL: https://issues.apache.org/jira/browse/SPARK-45771
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45771) Enable spark.eventLog.rolling.enabled by default

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45771:
---
Labels: pull-request-available  (was: )

> Enable spark.eventLog.rolling.enabled by default
> 
>
> Key: SPARK-45771
> URL: https://issues.apache.org/jira/browse/SPARK-45771
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45771) Enable spark.eventLog.rolling.enabled by default

2023-11-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45771:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Improvement)

> Enable spark.eventLog.rolling.enabled by default
> 
>
> Key: SPARK-45771
> URL: https://issues.apache.org/jira/browse/SPARK-45771
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45772) Add additional test coverage for input_file_name_expr

2023-11-02 Thread Utkarsh Agarwal (Jira)

Utkarsh Agarwal created SPARK-45772:
---

 Summary: Add additional test coverage for input_file_name_expr
 Key: SPARK-45772
 URL: https://issues.apache.org/jira/browse/SPARK-45772
 Project: Spark
  Issue Type: Task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Utkarsh Agarwal


https://issues.apache.org/jira/browse/SPARK-44705 introduced changes to the 
evaluation of the `input_file_name()` expression in the presence of Python 
UDFs. This was done to maintain the behavior of the `input_file_name()` 
expression when the execution model of the PythonRunner was made 
single-threaded by https://issues.apache.org/jira/browse/SPARK-44705. We should 
add additional test coverage for `input_file_name()` + Python UDFs to prevent 
future breakages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45772) Add additional test coverage for input_file_name() expr + Python UDFs

2023-11-02 Thread Utkarsh Agarwal (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Utkarsh Agarwal updated SPARK-45772:

Summary: Add additional test coverage for input_file_name() expr + Python 
UDFs  (was: Add additional test coverage for input_file_name_expr)

> Add additional test coverage for input_file_name() expr + Python UDFs
> -
>
> Key: SPARK-45772
> URL: https://issues.apache.org/jira/browse/SPARK-45772
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Utkarsh Agarwal
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-44705 introduced changes to the 
> evaluation of the `input_file_name()` expression in the presence of Python 
> UDFs. This was done to maintain the behavior of the `input_file_name()` 
> expression when the execution model of the PythonRunner was made 
> single-threaded by https://issues.apache.org/jira/browse/SPARK-44705. We 
> should add additional test coverage for `input_file_name()` + Python UDFs to 
> prevent future breakages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45771) Enable spark.eventLog.rolling.enabled by default

2023-11-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45771.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43638
[https://github.com/apache/spark/pull/43638]

> Enable spark.eventLog.rolling.enabled by default
> 
>
> Key: SPARK-45771
> URL: https://issues.apache.org/jira/browse/SPARK-45771
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45771) Enable spark.eventLog.rolling.enabled by default

2023-11-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45771:
-

Assignee: Dongjoon Hyun

> Enable spark.eventLog.rolling.enabled by default
> 
>
> Key: SPARK-45771
> URL: https://issues.apache.org/jira/browse/SPARK-45771
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45718) Remove remaining deprecated Pandas APIs from Spark 3.4.0

2023-11-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45718:
-

Assignee: Haejoon Lee

> Remove remaining deprecated Pandas APIs from Spark 3.4.0
> 
>
> Key: SPARK-45718
> URL: https://issues.apache.org/jira/browse/SPARK-45718
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Remove remaining deprecated Pandas APIs from Spark 3.4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45718) Remove remaining deprecated Pandas APIs from Spark 3.4.0

2023-11-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45718.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43581
[https://github.com/apache/spark/pull/43581]

> Remove remaining deprecated Pandas APIs from Spark 3.4.0
> 
>
> Key: SPARK-45718
> URL: https://issues.apache.org/jira/browse/SPARK-45718
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Remove remaining deprecated Pandas APIs from Spark 3.4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45757) Avoid re-computation of NNZ in Binarizer

2023-11-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45757.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43619
[https://github.com/apache/spark/pull/43619]

> Avoid re-computation of NNZ in Binarizer
> 
>
> Key: SPARK-45757
> URL: https://issues.apache.org/jira/browse/SPARK-45757
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45757) Avoid re-computation of NNZ in Binarizer

2023-11-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45757:
-

Assignee: Ruifeng Zheng

> Avoid re-computation of NNZ in Binarizer
> 
>
> Key: SPARK-45757
> URL: https://issues.apache.org/jira/browse/SPARK-45757
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45773) Refine docstring of `SparkSession.builder.config`

2023-11-02 Thread Allison Wang (Jira)

Allison Wang created SPARK-45773:


 Summary: Refine docstring of `SparkSession.builder.config`
 Key: SPARK-45773
 URL: https://issues.apache.org/jira/browse/SPARK-45773
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Refine the docstring of SparkSession.builder.config

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45773) Refine docstring of `SparkSession.builder.config`

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45773:
---
Labels: pull-request-available  (was: )

> Refine docstring of `SparkSession.builder.config`
> -
>
> Key: SPARK-45773
> URL: https://issues.apache.org/jira/browse/SPARK-45773
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Refine the docstring of SparkSession.builder.config
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-4836) Web UI should display separate information for all stage attempts

2023-11-02 Thread Josh Rosen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reopened SPARK-4836:
---

> Web UI should display separate information for all stage attempts
> -
>
> Key: SPARK-4836
> URL: https://issues.apache.org/jira/browse/SPARK-4836
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.1.1, 1.2.0
>Reporter: Josh Rosen
>Priority: Major
>  Labels: bulk-closed
>
> I've run into some cases where the web UI job page will say that a job took 
> 12 minutes but the sum of that job's stage times is something like 10 
> seconds.  In this case, it turns out that my job ran a stage to completion 
> (which took, say, 5 minutes) then lost some partitions of that stage and had 
> to run a new stage attempt to recompute one or two tasks from that stage.  As 
> a result, the latest attempt for that stage reports only one or two tasks.  
> In the web UI, it seems that we only show the latest stage attempt, not all 
> attempts, which can lead to confusing / misleading displays for jobs with 
> failed / partially-recomputed stages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4836) Web UI should display separate information for all stage attempts

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-4836:
--
Labels: bulk-closed pull-request-available  (was: bulk-closed)

> Web UI should display separate information for all stage attempts
> -
>
> Key: SPARK-4836
> URL: https://issues.apache.org/jira/browse/SPARK-4836
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.1.1, 1.2.0
>Reporter: Josh Rosen
>Priority: Major
>  Labels: bulk-closed, pull-request-available
>
> I've run into some cases where the web UI job page will say that a job took 
> 12 minutes but the sum of that job's stage times is something like 10 
> seconds.  In this case, it turns out that my job ran a stage to completion 
> (which took, say, 5 minutes) then lost some partitions of that stage and had 
> to run a new stage attempt to recompute one or two tasks from that stage.  As 
> a result, the latest attempt for that stage reports only one or two tasks.  
> In the web UI, it seems that we only show the latest stage attempt, not all 
> attempts, which can lead to confusing / misleading displays for jobs with 
> failed / partially-recomputed stages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44517) first operator should respect the nullability of child expression as well as ignoreNulls option

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44517:
---
Labels: pull-request-available  (was: )

> first operator should respect the nullability of child expression as well as 
> ignoreNulls option
> ---
>
> Key: SPARK-44517
> URL: https://issues.apache.org/jira/browse/SPARK-44517
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.2, 
> 3.4.0, 3.4.1
>Reporter: Nan Zhu
>Priority: Major
>  Labels: pull-request-available
>
> I found the following problem when using Spark recently:
>  
> {code:java}
> // code placeholder
> import spark.implicits._
> val s = Seq((1.2, "s", 2.2)).toDF("v1", "v2", "v3")
> val schema = StructType(Seq(StructField("v1", DoubleType, nullable = 
> false),StructField("v2", StringType, nullable = true),StructField("v3", 
> DoubleType, nullable = false)))
> val df = spark.createDataFrame(s.rdd, schema)val inputDF = 
> val inputDF = df.dropDuplicates("v3")
> spark.sql("CREATE TABLE local.db.table (\n v1 DOUBLE NOT NULL,\n v2 STRING, 
> v3 DOUBLE NOT NULL)")
> inputDF.write.mode("overwrite").format("iceberg").save("local.db.table") 
> {code}
>  
>  
> when I use the above code to write to iceberg (i guess Delta Lake will have 
> the same problem) , I got very confusing exception
> {code:java}
> Exception in thread "main" java.lang.IllegalArgumentException: Cannot write 
> incompatible dataset to table with schema:
> table 
> {  1: v1: required double  2: v2: optional string  3: v3: required double}
> Provided schema:
> table {  1: v1: optional double  2: v2: optional string  3: v3: required 
> double} {code}
> basically it complains that we have v1 as the nullable column in our 
> `inputDF` above which is not allowed since we created table with the v1 as 
> not nullable. The confusion comes from that,  if we check the schema with 
> printSchema() of inputDF, v1 is not nullable
> {noformat}
> root 
> |-- v1: double (nullable = false) 
> |-- v2: string (nullable = true) 
> |-- v3: double (nullable = false){noformat}
> Clearly, something changed the v1's nullability unexpectedly!
>  
> After some debugging I found that the key is that dropDuplicates("v3"). In 
> optimization phase, we have ReplaceDeduplicateWithAggregate to replace the 
> Deduplicate with aggregate on v3 and run first() over all other columns. 
> However, first() operator has hard coded nullable as always "true" which is 
> the source of changed nullability of v1
>  
> this is a very confusing behavior of Spark, and probably no one really 
> noticed as we do not care too much without the new table formats like delta 
> lake and iceberg which can make nullability check correctly. Nowadays, we 
> users adopt them more and more, this is surfaced up
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44065) Optimize BroadcastHashJoin skew when localShuffleReader is disabled

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44065:
---
Labels: pull-request-available  (was: )

> Optimize BroadcastHashJoin skew when localShuffleReader is disabled
> ---
>
> Key: SPARK-44065
> URL: https://issues.apache.org/jira/browse/SPARK-44065
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Zhen Wang
>Priority: Major
>  Labels: pull-request-available
>
> In RemoteShuffleService services such as uniffle and celeborn, it is 
> recommended to disable localShuffleReader by default for better performance. 
> But it may make BroadcastHashJoin skewed, so I want to optimize 
> BroadcastHashJoin skew in OptimizeSkewedJoin when localShuffleReader is 
> disabled.
>  
> Refer to:
> https://github.com/apache/incubator-celeborn#spark-configuration
> https://github.com/apache/incubator-uniffle/blob/master/docs/client_guide.md#support-spark-aqe



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45768) Make faulthandler a runtime configuration for Python execution in SQL

2023-11-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45768:


Assignee: Hyukjin Kwon

> Make faulthandler a runtime configuration for Python execution in SQL
> -
>
> Key: SPARK-45768
> URL: https://issues.apache.org/jira/browse/SPARK-45768
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> faulthanlder feature within PySpark is really useful especially to debug an 
> errors that regular Python interpreter cannot catch out of the box such as a 
> segmentation fault errors, see also 
> https://github.com/apache/spark/pull/43600. It would be very useful to 
> convert this as a runtime configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45768) Make faulthandler a runtime configuration for Python execution in SQL

2023-11-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45768.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43635
[https://github.com/apache/spark/pull/43635]

> Make faulthandler a runtime configuration for Python execution in SQL
> -
>
> Key: SPARK-45768
> URL: https://issues.apache.org/jira/browse/SPARK-45768
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> faulthanlder feature within PySpark is really useful especially to debug an 
> errors that regular Python interpreter cannot catch out of the box such as a 
> segmentation fault errors, see also 
> https://github.com/apache/spark/pull/43600. It would be very useful to 
> convert this as a runtime configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45687) Fix `Passing an explicit array value to a Scala varargs method is deprecated`

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45687:
---
Labels: pull-request-available  (was: )

> Fix `Passing an explicit array value to a Scala varargs method is deprecated`
> -
>
> Key: SPARK-45687
> URL: https://issues.apache.org/jira/browse/SPARK-45687
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
>  
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala:945:21:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.AggregationQuerySuite, version=2.13.0
> [warn]         df.agg(udaf(allColumns: _*)),
> [warn]                     ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:156:48:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]         df.agg(aggFunctions.head, aggFunctions.tail: _*),
> [warn]                                                ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:161:76:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]         df.groupBy($"id" % 4 as "mod").agg(aggFunctions.head, 
> aggFunctions.tail: _*),
> [warn]                                                                        
>     ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:171:50:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]           df.agg(aggFunctions.head, aggFunctions.tail: _*),
> [warn]  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`

2023-11-02 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45694.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43637
[https://github.com/apache/spark/pull/43637]

> Fix `method signum in trait ScalaNumberProxy is deprecated`
> ---
>
> Key: SPARK-45694
> URL: https://issues.apache.org/jira/browse/SPARK-45694
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Tengfei Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25:
>  method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use 
> `sign` method instead
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc,
>  origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0
> [warn]       val uc = useCount.signum
> [warn]   {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`

2023-11-02 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-45694:


Assignee: Tengfei Huang

> Fix `method signum in trait ScalaNumberProxy is deprecated`
> ---
>
> Key: SPARK-45694
> URL: https://issues.apache.org/jira/browse/SPARK-45694
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Tengfei Huang
>Priority: Minor
>  Labels: pull-request-available
>
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25:
>  method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use 
> `sign` method instead
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc,
>  origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0
> [warn]       val uc = useCount.signum
> [warn]   {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45695) Fix `method force in trait View is deprecated`

2023-11-02 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45695.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43637
[https://github.com/apache/spark/pull/43637]

> Fix `method force in trait View is deprecated`
> --
>
> Key: SPARK-45695
> URL: https://issues.apache.org/jira/browse/SPARK-45695
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Tengfei Huang
>Priority: Minor
> Fix For: 4.0.0
>
>
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala:368:36:
>  method force in trait View is deprecated (since 2.13.0): Views no longer 
> know about their underlying collection type; .force always returns an 
> IndexedSeq
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.catalyst.trees.TreeNode.legacyWithNewChildren.newArgs.$anonfun,
>  origin=scala.collection.View.force, version=2.13.
> [warn]         m.mapValues(mapChild).view.force.toMap
> [warn]                                    ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45774) Support `spark.ui.historyServerUrl` in ApplicationPage

2023-11-02 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45774:
-

 Summary: Support `spark.ui.historyServerUrl` in ApplicationPage
 Key: SPARK-45774
 URL: https://issues.apache.org/jira/browse/SPARK-45774
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, Web UI
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45774) Support `spark.ui.historyServerUrl` in ApplicationPage

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45774:
---
Labels: pull-request-available  (was: )

> Support `spark.ui.historyServerUrl` in ApplicationPage
> --
>
> Key: SPARK-45774
> URL: https://issues.apache.org/jira/browse/SPARK-45774
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45775) Drop table skiped when CatalogV2Util loadTable meet unexpected Exception

2023-11-02 Thread konwu (Jira)

konwu created SPARK-45775:
-

 Summary: Drop table skiped when CatalogV2Util loadTable meet 
unexpected Exception
 Key: SPARK-45775
 URL: https://issues.apache.org/jira/browse/SPARK-45775
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.3
 Environment: spark 3.1.3
Reporter: konwu


Currently  CatalogV2Util.loadTable method catch only NoSuch*Exception like below
{code:java}
  def loadTable(catalog: CatalogPlugin, ident: Identifier): Option[Table] =
    try {
      Option(catalog.asTableCatalog.loadTable(ident))
    } catch {
      case _: NoSuchTableException => None
      case _: NoSuchDatabaseException => None
      case _: NoSuchNamespaceException => None
    } {code}
It will skip drop table when conmunicate with meta time out or other Exception, 
because the method always return None, maybe we should catch it like below
{code:java}
def loadTable(catalog: CatalogPlugin, ident: Identifier): Option[Table] =
  try {
Option(catalog.asTableCatalog.loadTable(ident))
  } catch {
case e: NoSuchTableException =>  return None
case e: NoSuchDatabaseException =>  return None
case e: NoSuchNamespaceException =>  return None
case e: Throwable =>  throw e
  } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45776) Remove the defensive null check added in SPARK-39553.

2023-11-02 Thread Yang Jie (Jira)

Yang Jie created SPARK-45776:


 Summary: Remove the defensive null check added in SPARK-39553.
 Key: SPARK-45776
 URL: https://issues.apache.org/jira/browse/SPARK-45776
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
 Environment: {code:java}
def unregisterShuffle(shuffleId: Int): Unit = {
    shuffleStatuses.remove(shuffleId).foreach { shuffleStatus =>
      // SPARK-39553: Add protection for Scala 2.13 due to 
https://github.com/scala/bug/issues/12613
      // We should revert this if Scala 2.13 solves this issue.
      if (shuffleStatus != null) {
        shuffleStatus.invalidateSerializedMapOutputStatusCache()
        shuffleStatus.invalidateSerializedMergeOutputStatusCache()
      }
    }
  } {code}
This issue has been fixed in Scala 2.13.9.
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45776) Remove the defensive null check added in SPARK-39553.

2023-11-02 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45776:
-
Environment: (was: {code:java}
def unregisterShuffle(shuffleId: Int): Unit = {
    shuffleStatuses.remove(shuffleId).foreach { shuffleStatus =>
      // SPARK-39553: Add protection for Scala 2.13 due to 
https://github.com/scala/bug/issues/12613
      // We should revert this if Scala 2.13 solves this issue.
      if (shuffleStatus != null) {
        shuffleStatus.invalidateSerializedMapOutputStatusCache()
        shuffleStatus.invalidateSerializedMergeOutputStatusCache()
      }
    }
  } {code}
This issue has been fixed in Scala 2.13.9.)

> Remove the defensive null check added in SPARK-39553.
> -
>
> Key: SPARK-45776
> URL: https://issues.apache.org/jira/browse/SPARK-45776
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45776) Remove the defensive null check added in SPARK-39553.

2023-11-02 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45776:
-
Description: 
{code:java}
def unregisterShuffle(shuffleId: Int): Unit = {
    shuffleStatuses.remove(shuffleId).foreach { shuffleStatus =>
      // SPARK-39553: Add protection for Scala 2.13 due to 
https://github.com/scala/bug/issues/12613
      // We should revert this if Scala 2.13 solves this issue.
      if (shuffleStatus != null) {
        shuffleStatus.invalidateSerializedMapOutputStatusCache()
        shuffleStatus.invalidateSerializedMergeOutputStatusCache()
      }
    }
  } {code}
This issue has been fixed in Scala 2.13.9.

> Remove the defensive null check added in SPARK-39553.
> -
>
> Key: SPARK-45776
> URL: https://issues.apache.org/jira/browse/SPARK-45776
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> {code:java}
> def unregisterShuffle(shuffleId: Int): Unit = {
>     shuffleStatuses.remove(shuffleId).foreach { shuffleStatus =>
>       // SPARK-39553: Add protection for Scala 2.13 due to 
> https://github.com/scala/bug/issues/12613
>       // We should revert this if Scala 2.13 solves this issue.
>       if (shuffleStatus != null) {
>         shuffleStatus.invalidateSerializedMapOutputStatusCache()
>         shuffleStatus.invalidateSerializedMergeOutputStatusCache()
>       }
>     }
>   } {code}
> This issue has been fixed in Scala 2.13.9.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45774) Support `spark.ui.historyServerUrl` in `ApplicationPage`

2023-11-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45774:
--
Summary: Support `spark.ui.historyServerUrl` in `ApplicationPage`  (was: 
Support `spark.ui.historyServerUrl` in ApplicationPage)

> Support `spark.ui.historyServerUrl` in `ApplicationPage`
> 
>
> Key: SPARK-45774
> URL: https://issues.apache.org/jira/browse/SPARK-45774
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45774) Support `spark.ui.historyServerUrl` in ApplicationPage

2023-11-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45774:
-

Assignee: Dongjoon Hyun

> Support `spark.ui.historyServerUrl` in ApplicationPage
> --
>
> Key: SPARK-45774
> URL: https://issues.apache.org/jira/browse/SPARK-45774
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45776) Remove the defensive null check added in SPARK-39553.

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45776:
---
Labels: pull-request-available  (was: )

> Remove the defensive null check added in SPARK-39553.
> -
>
> Key: SPARK-45776
> URL: https://issues.apache.org/jira/browse/SPARK-45776
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>
> {code:java}
> def unregisterShuffle(shuffleId: Int): Unit = {
>     shuffleStatuses.remove(shuffleId).foreach { shuffleStatus =>
>       // SPARK-39553: Add protection for Scala 2.13 due to 
> https://github.com/scala/bug/issues/12613
>       // We should revert this if Scala 2.13 solves this issue.
>       if (shuffleStatus != null) {
>         shuffleStatus.invalidateSerializedMapOutputStatusCache()
>         shuffleStatus.invalidateSerializedMergeOutputStatusCache()
>       }
>     }
>   } {code}
> This issue has been fixed in Scala 2.13.9.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45777) Support `spark.test.appId` in `LocalSchedulerBackend`

2023-11-02 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45777:
-

 Summary: Support `spark.test.appId` in `LocalSchedulerBackend`
 Key: SPARK-45777
 URL: https://issues.apache.org/jira/browse/SPARK-45777
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45777) Support `spark.test.appId` in `LocalSchedulerBackend`

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45777:
---
Labels: pull-request-available  (was: )

> Support `spark.test.appId` in `LocalSchedulerBackend`
> -
>
> Key: SPARK-45777
> URL: https://issues.apache.org/jira/browse/SPARK-45777
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45556) Inconsistent status code between web page and REST API when exception is thrown

2023-11-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45556:
---
Labels: pull-request-available  (was: )

> Inconsistent status code between web page and REST API when exception is 
> thrown
> ---
>
> Key: SPARK-45556
> URL: https://issues.apache.org/jira/browse/SPARK-45556
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.5.0
>Reporter: wy
>Priority: Minor
>  Labels: pull-request-available
>
> Spark history server provides 
> [AppHistoryServerPlugin|https://github.com/kuwii/spark/blob/dev/status-code/core/src/main/scala/org/apache/spark/status/AppHistoryServerPlugin.scala]
>  to add extra REST API and web pages. However there's an issue when 
> exceptions are thrown, causing incnosistent status code between web page and 
> REST API.
> For REST API, if the thrown exception is an instance of 
> WebApplicationException, then the status code will be set as the one defined 
> within the exception.
> However for web page, all exceptions are wrapped within a 500 response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45730) [CORE] Make ReloadingX509TrustManagerSuite less flaky

2023-11-02 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-45730.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43596
[https://github.com/apache/spark/pull/43596]

> [CORE] Make ReloadingX509TrustManagerSuite less flaky
> -
>
> Key: SPARK-45730
> URL: https://issues.apache.org/jira/browse/SPARK-45730
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hasnain Lakhani
>Assignee: Hasnain Lakhani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45730) [CORE] Make ReloadingX509TrustManagerSuite less flaky

2023-11-02 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-45730:
---

Assignee: Hasnain Lakhani

> [CORE] Make ReloadingX509TrustManagerSuite less flaky
> -
>
> Key: SPARK-45730
> URL: https://issues.apache.org/jira/browse/SPARK-45730
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hasnain Lakhani
>Assignee: Hasnain Lakhani
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45777) Support `spark.test.appId` in `LocalSchedulerBackend`

2023-11-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45777.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43645
[https://github.com/apache/spark/pull/43645]

> Support `spark.test.appId` in `LocalSchedulerBackend`
> -
>
> Key: SPARK-45777
> URL: https://issues.apache.org/jira/browse/SPARK-45777
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45777) Support `spark.test.appId` in `LocalSchedulerBackend`

2023-11-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45777:
-

Assignee: Dongjoon Hyun

> Support `spark.test.appId` in `LocalSchedulerBackend`
> -
>
> Key: SPARK-45777
> URL: https://issues.apache.org/jira/browse/SPARK-45777
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45777) Support `spark.test.appId` in `LocalSchedulerBackend`

2023-11-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45777:
--
Parent: SPARK-45756
Issue Type: Sub-task  (was: Improvement)

> Support `spark.test.appId` in `LocalSchedulerBackend`
> -
>
> Key: SPARK-45777
> URL: https://issues.apache.org/jira/browse/SPARK-45777
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

49 matches

Mail list logo