date:20240605

[jira] [Updated] (SPARK-48536) Cache user specified schema in applyInPandas and applyInArrow

2024-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48536:
---
Labels: pull-request-available  (was: )

> Cache user specified schema in applyInPandas and applyInArrow
> -
>
> Key: SPARK-48536
> URL: https://issues.apache.org/jira/browse/SPARK-48536
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48508) Client Side RPC optimization for Spark Connect

2024-06-05 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-48508:
--
Labels:   (was: pull-request-available)

> Client Side RPC optimization for Spark Connect
> --
>
> Key: SPARK-48508
> URL: https://issues.apache.org/jira/browse/SPARK-48508
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48537) Support Hive UDAF inherit from `GenericUDAFResolver` and `GenericUDAFResolver2`

2024-06-05 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-48537:
---

 Summary: Support Hive UDAF inherit from `GenericUDAFResolver` and 
`GenericUDAFResolver2`
 Key: SPARK-48537
 URL: https://issues.apache.org/jira/browse/SPARK-48537
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48537) Support Hive UDAF inherit from `GenericUDAFResolver` and `GenericUDAFResolver2`

2024-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48537:
---
Labels: pull-request-available  (was: )

> Support Hive UDAF inherit from `GenericUDAFResolver` and 
> `GenericUDAFResolver2`
> ---
>
> Key: SPARK-48537
> URL: https://issues.apache.org/jira/browse/SPARK-48537
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48535) Update doc to log warning for join null related config usage

2024-06-05 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48535.
--
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46875
[https://github.com/apache/spark/pull/46875]

> Update doc to log warning for join null related config usage
> 
>
> Key: SPARK-48535
> URL: https://issues.apache.org/jira/browse/SPARK-48535
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Anish Shrigondekar
>Assignee: Anish Shrigondekar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> Update doc to log warning for join null related config usage



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48537) Support `Hive UDAF` inherited from `GenericUDAFResolver` or `GenericUDAFResolver2`

2024-06-05 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-48537:

Summary: Support `Hive UDAF` inherited from `GenericUDAFResolver` or 
`GenericUDAFResolver2`  (was: Support Hive UDAF inherit from 
`GenericUDAFResolver` and `GenericUDAFResolver2`)

> Support `Hive UDAF` inherited from `GenericUDAFResolver` or 
> `GenericUDAFResolver2`
> --
>
> Key: SPARK-48537
> URL: https://issues.apache.org/jira/browse/SPARK-48537
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48537) Support `Hive UDAF` inherited from `GenericUDAFResolver` or `GenericUDAFResolver2`

2024-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48537:
--

Assignee: (was: Apache Spark)

> Support `Hive UDAF` inherited from `GenericUDAFResolver` or 
> `GenericUDAFResolver2`
> --
>
> Key: SPARK-48537
> URL: https://issues.apache.org/jira/browse/SPARK-48537
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48538) Avoid memory leak of bonecp as described in HIVE-15551

2024-06-05 Thread Kent Yao (Jira)

Kent Yao created SPARK-48538:


 Summary: Avoid memory leak of bonecp as described in HIVE-15551
 Key: SPARK-48538
 URL: https://issues.apache.org/jira/browse/SPARK-48538
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48538) Avoid memory leak of bonecp as described in HIVE-15551

2024-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48538:
---
Labels: pull-request-available  (was: )

> Avoid memory leak of bonecp as described in HIVE-15551
> --
>
> Key: SPARK-48538
> URL: https://issues.apache.org/jira/browse/SPARK-48538
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48539) Upgrade docker-java to 3.3.6

2024-06-05 Thread Wei Guo (Jira)

Wei Guo created SPARK-48539:
---

 Summary: Upgrade docker-java to 3.3.6
 Key: SPARK-48539
 URL: https://issues.apache.org/jira/browse/SPARK-48539
 Project: Spark
  Issue Type: Improvement
  Components: Spark Docker
Affects Versions: 4.0.0
Reporter: Wei Guo
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48536) Cache user specified schema in applyInPandas and applyInArrow

2024-06-05 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-48536.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46877
[https://github.com/apache/spark/pull/46877]

> Cache user specified schema in applyInPandas and applyInArrow
> -
>
> Key: SPARK-48536
> URL: https://issues.apache.org/jira/browse/SPARK-48536
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48540) Avoid ivy output loading settings to stdout

2024-06-05 Thread dzcxzl (Jira)

dzcxzl created SPARK-48540:
--

 Summary: Avoid ivy output loading settings to stdout
 Key: SPARK-48540
 URL: https://issues.apache.org/jira/browse/SPARK-48540
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.0
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48540) Avoid ivy output loading settings to stdout

2024-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48540:
---
Labels: pull-request-available  (was: )

> Avoid ivy output loading settings to stdout
> ---
>
> Key: SPARK-48540
> URL: https://issues.apache.org/jira/browse/SPARK-48540
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48541) Assign specific exit code for executors killed by TaskReaper

2024-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48541:
---
Labels: pull-request-available  (was: )

> Assign specific exit code for executors killed by TaskReaper
> 
>
> Key: SPARK-48541
> URL: https://issues.apache.org/jira/browse/SPARK-48541
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> When TaskReaper kills an executor, the executor loss reason would be "Command 
> exited with code 50", which is the exit code for default uncaught exception.
> We would like to have a specific exit code for executors killed by 
> TaskReaper, so that we can better monitor these cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48542) Give snapshotStartBatchId and snapshotPartitionId to the state data source

2024-06-05 Thread Yuchen Liu (Jira)

Yuchen Liu created SPARK-48542:
--

 Summary: Give snapshotStartBatchId and snapshotPartitionId to the 
state data source
 Key: SPARK-48542
 URL: https://issues.apache.org/jira/browse/SPARK-48542
 Project: Spark
  Issue Type: New Feature
  Components: SQL, Structured Streaming
Affects Versions: 4.0.0
 Environment: This should work for both HDFS state store and RocksDB 
state store.
Reporter: Yuchen Liu


Right now, to read a version of the state data, the state source will try to 
find the first snapshot file before the given version and construct it using 
the delta files. In some debugging scenarios, users need more granular control 
on how to reconstruct the given state, for example they want to start from a 
specific snapshot instead of the closest one. One use case is to find whether a 
snapshot has been corrupted after committing.

This task introduces two options {{snapshotStartBatchId}} and 
{{snapshotPartitionId}} to the state data source. By specifying them, users can 
control the starting batch id of the snapshot and partition id of the state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48543) Track invalid unsafe row exception explicitly as error class

2024-06-05 Thread Anish Shrigondekar (Jira)

Anish Shrigondekar created SPARK-48543:
--

 Summary: Track invalid unsafe row exception explicitly as error 
class
 Key: SPARK-48543
 URL: https://issues.apache.org/jira/browse/SPARK-48543
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Anish Shrigondekar


Track invalid unsafe row exception explicitly as error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48543) Track invalid unsafe row exception explicitly as error class

2024-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48543:
---
Labels: pull-request-available  (was: )

> Track invalid unsafe row exception explicitly as error class
> 
>
> Key: SPARK-48543
> URL: https://issues.apache.org/jira/browse/SPARK-48543
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Anish Shrigondekar
>Priority: Major
>  Labels: pull-request-available
>
> Track invalid unsafe row exception explicitly as error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41469) Task rerun on decommissioned executor can be avoided if shuffle data has migrated

2024-06-05 Thread Yeachan Park (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852555#comment-17852555
 ] 

Yeachan Park commented on SPARK-41469:
--

Maybe a stupid question, but I thought the whole point of shuffle migration was 
that it doesn't have to be re-computed. Was nothing being done being done with 
the migrated shuffle blocks prior to this PR?

> Task rerun on decommissioned executor can be avoided if shuffle data has 
> migrated
> -
>
> Key: SPARK-41469
> URL: https://issues.apache.org/jira/browse/SPARK-41469
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.3, 3.2.2, 3.3.1
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Currently, we will always rerun a finished shuffle map task if it once runs 
> the lost executor. However, in the case of the executor loss is caused by 
> decommission, the shuffle data might be migrated so that task doesn't need to 
> rerun.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48545) Create from_avro SQL function to match PySpark equivalent

2024-06-05 Thread Daniel (Jira)

Daniel created SPARK-48545:
--

 Summary: Create from_avro SQL function to match PySpark equivalent
 Key: SPARK-48545
 URL: https://issues.apache.org/jira/browse/SPARK-48545
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Daniel


The PySpark API is here: 
https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/python/pyspark/sql/avro/functions.py#L35



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48545) Create from_avro SQL function to match PySpark equivalent

2024-06-05 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852593#comment-17852593
 ] 

Gengliang Wang commented on SPARK-48545:


+1 for having such functions (from_avro and to_avro)

> Create from_avro SQL function to match PySpark equivalent
> -
>
> Key: SPARK-48545
> URL: https://issues.apache.org/jira/browse/SPARK-48545
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Priority: Major
>
> The PySpark API is here: 
> https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/python/pyspark/sql/avro/functions.py#L35



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48546) Fix ExpressionEncoder after replacing NullPointerExceptions with proper error classes in AssertNotNull expression

2024-06-05 Thread Daniel (Jira)

Daniel created SPARK-48546:
--

 Summary: Fix ExpressionEncoder after replacing 
NullPointerExceptions with proper error classes in AssertNotNull expression
 Key: SPARK-48546
 URL: https://issues.apache.org/jira/browse/SPARK-48546
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Daniel






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48547) Add opt-in flag to have SparkSubmit automatically call System.exit after user code main method exits

2024-06-05 Thread Josh Rosen (Jira)

Josh Rosen created SPARK-48547:
--

 Summary: Add opt-in flag to have SparkSubmit automatically call 
System.exit after user code main method exits
 Key: SPARK-48547
 URL: https://issues.apache.org/jira/browse/SPARK-48547
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 4.0.0
Reporter: Josh Rosen
Assignee: Josh Rosen


This PR proposes to add a new flag, `spark.submit.callSystemExitOnMainExit` 
(default false), which when true will instruct SparkSubmit to call 
System.exit() in the JVM once the user code's main method has exited (for Java 
/ Scala jobs) or once the user's Python or R script has exited.

This is intended to address a longstanding issue where SparkSubmit invocations 
might hang after user code has completed:

[According to Java’s java.lang.Runtime 
docs|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Runtime.html#shutdown]:
{quote}The Java Virtual Machine initiates the _shutdown sequence_ in response 
to one of several events:
 # when the number of 
[live|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#isAlive()]
 non-daemon threads drops to zero for the first time (see note below on the JNI 
Invocation API);

 # when the {{Runtime.exit}} or {{System.exit}} method is called for the first 
time; or

 # when some external event occurs, such as an interrupt or a signal is 
received from the operating system.{quote}
For Python and R programs, SparkSubmit’s PythonRunner and RRunner will call 
{{System.exit()}} if the user program exits with a non-zero exit code (see 
[python|https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L101-L104]
 and 
[R|https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L109-L111]
 runner code).

But for Java and Scala programs, plus any _successful_ R or Python programs, 
Spark will _not_ automatically call System.exit.

In those situation, the JVM will only shutdown when, via event (1), all 
non-[daemon|https://stackoverflow.com/questions/2213340/what-is-a-daemon-thread-in-java]
 threads have exited (unless the job is cancelled and sent an external 
interrupt / kill signal, corresponding to event (3)).

Thus, *non-daemon* threads might cause logically-completed spark-submit jobs to 
hang rather than completing.

The non-daemon threads are not always under Spark's own control and may not 
necessarily be cleaned up by SparkContext.stop().

Thus, it is useful to have an opt-in functionality to have SparkSubmit 
automatically call `System.exit()` upon main method exit (which usually, but 
not always, corresponds to job completion): this option will allow users and 
platform operators to enforce System.exit() calls without having to modify 
individual jobs' code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48547) Add opt-in flag to have SparkSubmit automatically call System.exit after user code main method exits

2024-06-05 Thread Josh Rosen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-48547:
---
Description: 
This PR proposes to add a new flag, `spark.submit.callSystemExitOnMainExit` 
(default false), which when true will instruct SparkSubmit to call 
System.exit() in the JVM once the user code's main method has exited (for Java 
/ Scala jobs) or once the user's Python or R script has exited.

This is intended to address a longstanding issue where SparkSubmit invocations 
might hang after user code has completed:

[According to Java’s java.lang.Runtime 
docs|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Runtime.html#shutdown]:
{quote}The Java Virtual Machine initiates the _shutdown sequence_ in response 
to one of several events:
 # when the number of 
[live|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#isAlive()]
 non-daemon threads drops to zero for the first time (see note below on the JNI 
Invocation API);

 # when the {{Runtime.exit}} or {{System.exit}} method is called for the first 
time; or

 # when some external event occurs, such as an interrupt or a signal is 
received from the operating system.{quote}
For Python and R programs, SparkSubmit’s PythonRunner and RRunner will call 
{{System.exit()}} if the user program exits with a non-zero exit code (see 
[python|https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L101-L104]
 and 
[R|https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L109-L111]
 runner code).

But for Java and Scala programs, plus any _successful_ R or Python programs, 
Spark will _not_ automatically call System.exit.

In those situation, the JVM will only shutdown when, via event (1), all 
non-[daemon|https://stackoverflow.com/questions/2213340/what-is-a-daemon-thread-in-java]
 threads have exited (unless the job is cancelled and sent an external 
interrupt / kill signal, corresponding to event (3)).

Thus, *non-daemon* threads might cause logically-completed spark-submit jobs to 
hang rather than completing.

The non-daemon threads are not always under Spark's own control and may not 
necessarily be cleaned up by SparkContext.stop().

Thus, it is useful to have an opt-in functionality to have SparkSubmit 
automatically call `System.exit()` upon main method exit (which usually, but 
not always, corresponds to job completion): this option will allow users and 
data platform operators to enforce System.exit() calls without having to modify 
individual jobs' code.

  was:
This PR proposes to add a new flag, `spark.submit.callSystemExitOnMainExit` 
(default false), which when true will instruct SparkSubmit to call 
System.exit() in the JVM once the user code's main method has exited (for Java 
/ Scala jobs) or once the user's Python or R script has exited.

This is intended to address a longstanding issue where SparkSubmit invocations 
might hang after user code has completed:

[According to Java’s java.lang.Runtime 
docs|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Runtime.html#shutdown]:
{quote}The Java Virtual Machine initiates the _shutdown sequence_ in response 
to one of several events:
 # when the number of 
[live|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#isAlive()]
 non-daemon threads drops to zero for the first time (see note below on the JNI 
Invocation API);

 # when the {{Runtime.exit}} or {{System.exit}} method is called for the first 
time; or

 # when some external event occurs, such as an interrupt or a signal is 
received from the operating system.{quote}
For Python and R programs, SparkSubmit’s PythonRunner and RRunner will call 
{{System.exit()}} if the user program exits with a non-zero exit code (see 
[python|https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L101-L104]
 and 
[R|https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L109-L111]
 runner code).

But for Java and Scala programs, plus any _successful_ R or Python programs, 
Spark will _not_ automatically call System.exit.

In those situation, the JVM will only shutdown when, via event (1), all 
non-[daemon|https://stackoverflow.com/questions/2213340/what-is-a-daemon-thread-in-java]
 threads have exited (unless the job is cancelled and sent an external 
interrupt / kill signal, corresponding to event (3)).

Thus, *non-daemon* threads might cause logically-completed spark-submit jobs to 
hang rather than completing.

The non-daemon threads are not always under Spark's own control and may not 
necessarily be cleaned up by SparkContext.st

[jira] [Updated] (SPARK-48546) Fix ExpressionEncoder after replacing NullPointerExceptions with proper error classes in AssertNotNull expression

2024-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48546:
---
Labels: pull-request-available  (was: )

> Fix ExpressionEncoder after replacing NullPointerExceptions with proper error 
> classes in AssertNotNull expression
> -
>
> Key: SPARK-48546
> URL: https://issues.apache.org/jira/browse/SPARK-48546
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48547) Add opt-in flag to have SparkSubmit automatically call System.exit after user code main method exits

2024-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48547:
---
Labels: pull-request-available  (was: )

> Add opt-in flag to have SparkSubmit automatically call System.exit after user 
> code main method exits
> 
>
> Key: SPARK-48547
> URL: https://issues.apache.org/jira/browse/SPARK-48547
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 4.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>  Labels: pull-request-available
>
> This PR proposes to add a new flag, `spark.submit.callSystemExitOnMainExit` 
> (default false), which when true will instruct SparkSubmit to call 
> System.exit() in the JVM once the user code's main method has exited (for 
> Java / Scala jobs) or once the user's Python or R script has exited.
> This is intended to address a longstanding issue where SparkSubmit 
> invocations might hang after user code has completed:
> [According to Java’s java.lang.Runtime 
> docs|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Runtime.html#shutdown]:
> {quote}The Java Virtual Machine initiates the _shutdown sequence_ in response 
> to one of several events:
>  # when the number of 
> [live|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Thread.html#isAlive()]
>  non-daemon threads drops to zero for the first time (see note below on the 
> JNI Invocation API);
>  # when the {{Runtime.exit}} or {{System.exit}} method is called for the 
> first time; or
>  # when some external event occurs, such as an interrupt or a signal is 
> received from the operating system.{quote}
> For Python and R programs, SparkSubmit’s PythonRunner and RRunner will call 
> {{System.exit()}} if the user program exits with a non-zero exit code (see 
> [python|https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L101-L104]
>  and 
> [R|https://github.com/apache/spark/blob/d5c33c6bfb5757b243fc8e1734daeaa4fe3b9b32/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L109-L111]
>  runner code).
> But for Java and Scala programs, plus any _successful_ R or Python programs, 
> Spark will _not_ automatically call System.exit.
> In those situation, the JVM will only shutdown when, via event (1), all 
> non-[daemon|https://stackoverflow.com/questions/2213340/what-is-a-daemon-thread-in-java]
>  threads have exited (unless the job is cancelled and sent an external 
> interrupt / kill signal, corresponding to event (3)).
> Thus, *non-daemon* threads might cause logically-completed spark-submit jobs 
> to hang rather than completing.
> The non-daemon threads are not always under Spark's own control and may not 
> necessarily be cleaned up by SparkContext.stop().
> Thus, it is useful to have an opt-in functionality to have SparkSubmit 
> automatically call `System.exit()` upon main method exit (which usually, but 
> not always, corresponds to job completion): this option will allow users and 
> data platform operators to enforce System.exit() calls without having to 
> modify individual jobs' code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48548) Update LICENSE/NOTICE for spark-core with shaded dependencies

2024-06-05 Thread Kent Yao (Jira)

Kent Yao created SPARK-48548:


 Summary: Update LICENSE/NOTICE for spark-core with shaded 
dependencies
 Key: SPARK-48548
 URL: https://issues.apache.org/jira/browse/SPARK-48548
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48548) Update LICENSE/NOTICE for spark-core with shaded dependencies

2024-06-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48548:
---
Labels: pull-request-available  (was: )

> Update LICENSE/NOTICE for spark-core with shaded dependencies
> -
>
> Key: SPARK-48548
> URL: https://issues.apache.org/jira/browse/SPARK-48548
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48539) Upgrade docker-java to 3.3.6

2024-06-05 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-48539:
-
Parent: SPARK-47361
Issue Type: Sub-task  (was: Improvement)

> Upgrade docker-java to 3.3.6
> 
>
> Key: SPARK-48539
> URL: https://issues.apache.org/jira/browse/SPARK-48539
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Docker
>Affects Versions: 4.0.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-41469) Task rerun on decommissioned executor can be avoided if shuffle data has migrated

2024-06-05 Thread Yeachan Park (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852555#comment-17852555
 ] 

Yeachan Park edited comment on SPARK-41469 at 6/6/24 6:28 AM:
--

Maybe a silly question, but I thought the whole point of shuffle migration was 
that it doesn't have to be re-computed. Was nothing being done being done with 
the migrated shuffle blocks prior to this PR? I'm probably misunderstanding 
something


was (Author: JIRAUSER288356):
Maybe a stupid question, but I thought the whole point of shuffle migration was 
that it doesn't have to be re-computed. Was nothing being done being done with 
the migrated shuffle blocks prior to this PR?

> Task rerun on decommissioned executor can be avoided if shuffle data has 
> migrated
> -
>
> Key: SPARK-41469
> URL: https://issues.apache.org/jira/browse/SPARK-41469
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.3, 3.2.2, 3.3.1
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Currently, we will always rerun a finished shuffle map task if it once runs 
> the lost executor. However, in the case of the executor loss is caused by 
> decommission, the shuffle data might be migrated so that task doesn't need to 
> rerun.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48540) Avoid ivy output loading settings to stdout

2024-06-05 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48540:


Assignee: dzcxzl

> Avoid ivy output loading settings to stdout
> ---
>
> Key: SPARK-48540
> URL: https://issues.apache.org/jira/browse/SPARK-48540
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48540) Avoid ivy output loading settings to stdout

2024-06-05 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48540.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46882
[https://github.com/apache/spark/pull/46882]

> Avoid ivy output loading settings to stdout
> ---
>
> Key: SPARK-48540
> URL: https://issues.apache.org/jira/browse/SPARK-48540
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47857) Utilize java.sql.RowId.getBytes API directly for UTF8String

2024-06-05 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-47857.
--
Resolution: Not A Problem

> Utilize java.sql.RowId.getBytes API directly for UTF8String
> ---
>
> Key: SPARK-47857
> URL: https://issues.apache.org/jira/browse/SPARK-47857
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48549) Restrict the number of parameters for function `sentences` to 1 or 3

2024-06-05 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-48549:
---

 Summary: Restrict the number of parameters for function 
`sentences` to 1 or 3
 Key: SPARK-48549
 URL: https://issues.apache.org/jira/browse/SPARK-48549
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48536) Cache user specified schema in applyInPandas and applyInArrow

[jira] [Updated] (SPARK-48508) Client Side RPC optimization for Spark Connect

[jira] [Created] (SPARK-48537) Support Hive UDAF inherit from `GenericUDAFResolver` and `GenericUDAFResolver2`

[jira] [Updated] (SPARK-48537) Support Hive UDAF inherit from `GenericUDAFResolver` and `GenericUDAFResolver2`

[jira] [Resolved] (SPARK-48535) Update doc to log warning for join null related config usage

[jira] [Updated] (SPARK-48537) Support `Hive UDAF` inherited from `GenericUDAFResolver` or `GenericUDAFResolver2`

[jira] [Assigned] (SPARK-48537) Support `Hive UDAF` inherited from `GenericUDAFResolver` or `GenericUDAFResolver2`

[jira] [Created] (SPARK-48538) Avoid memory leak of bonecp as described in HIVE-15551

[jira] [Updated] (SPARK-48538) Avoid memory leak of bonecp as described in HIVE-15551

[jira] [Created] (SPARK-48539) Upgrade docker-java to 3.3.6

[jira] [Resolved] (SPARK-48536) Cache user specified schema in applyInPandas and applyInArrow

[jira] [Created] (SPARK-48540) Avoid ivy output loading settings to stdout

[jira] [Updated] (SPARK-48540) Avoid ivy output loading settings to stdout

[jira] [Updated] (SPARK-48541) Assign specific exit code for executors killed by TaskReaper

[jira] [Created] (SPARK-48542) Give snapshotStartBatchId and snapshotPartitionId to the state data source

[jira] [Created] (SPARK-48543) Track invalid unsafe row exception explicitly as error class

[jira] [Updated] (SPARK-48543) Track invalid unsafe row exception explicitly as error class

[jira] [Commented] (SPARK-41469) Task rerun on decommissioned executor can be avoided if shuffle data has migrated

[jira] [Created] (SPARK-48545) Create from_avro SQL function to match PySpark equivalent

[jira] [Commented] (SPARK-48545) Create from_avro SQL function to match PySpark equivalent

[jira] [Created] (SPARK-48546) Fix ExpressionEncoder after replacing NullPointerExceptions with proper error classes in AssertNotNull expression

[jira] [Created] (SPARK-48547) Add opt-in flag to have SparkSubmit automatically call System.exit after user code main method exits

[jira] [Updated] (SPARK-48547) Add opt-in flag to have SparkSubmit automatically call System.exit after user code main method exits

[jira] [Updated] (SPARK-48546) Fix ExpressionEncoder after replacing NullPointerExceptions with proper error classes in AssertNotNull expression

[jira] [Updated] (SPARK-48547) Add opt-in flag to have SparkSubmit automatically call System.exit after user code main method exits

[jira] [Created] (SPARK-48548) Update LICENSE/NOTICE for spark-core with shaded dependencies

[jira] [Updated] (SPARK-48548) Update LICENSE/NOTICE for spark-core with shaded dependencies

[jira] [Updated] (SPARK-48539) Upgrade docker-java to 3.3.6

[jira] [Comment Edited] (SPARK-41469) Task rerun on decommissioned executor can be avoided if shuffle data has migrated

[jira] [Assigned] (SPARK-48540) Avoid ivy output loading settings to stdout

[jira] [Resolved] (SPARK-48540) Avoid ivy output loading settings to stdout

[jira] [Resolved] (SPARK-47857) Utilize java.sql.RowId.getBytes API directly for UTF8String

[jira] [Created] (SPARK-48549) Restrict the number of parameters for function `sentences` to 1 or 3

33 matches

Site Navigation

Mail list logo

Footer information