[
https://issues.apache.org/jira/browse/SPARK-50186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruochen Zou updated SPARK-50186:
--------------------------------
Description:
Currently, the Executor startup script hardcodes the addition of
{{{}-XX:OnOutOfMemoryError='kill %p'{}}}, which causes the process to be killed
when the Executor encounters an OOM (Out Of Memory) error.
{code:java}
// code in YarnSparkHadoopUtil
private[yarn] def addOutOfMemoryErrorArgument(javaOpts: ListBuffer[String]):
Unit = {
if (!javaOpts.exists(_.contains("-XX:OnOutOfMemoryError"))) {
if (Utils.isWindows) {
javaOpts += escapeForShell("-XX:OnOutOfMemoryError=taskkill /F /PID
%%%%p")
} else {
javaOpts += "-XX:OnOutOfMemoryError='kill %p'"
}
}
}{code}
As a result, the YarnAllocator receives an exit code of 143 and is unable to
accurately determine the reason for the Executor's termination based on this
exit code. Moreover, the CoarseGrainedExecutorBackend cannot guarantee that
StatusUpdate messages are sent to the Driver before the process is killed.
Could we remove this setting, since users can set it via the
{{spark.executor.extraJavaOptions}} parameter if necessary?
Executor log:
!image-2024-10-31-14-14-06-723.png!
Driver log:
!image-2024-10-31-14-14-31-701.png!
was:
Currently, the Executor startup script hardcodes the addition of
{{{}-XX:OnOutOfMemoryError='kill %p'{}}}, which causes the process to be killed
when the Executor encounters an OOM (Out Of Memory) error. As a result, the
YarnAllocator receives an exit code of 143 and is unable to accurately
determine the reason for the Executor's termination based on this exit code.
Moreover, the CoarseGrainedExecutorBackend cannot guarantee that StatusUpdate
messages are sent to the Driver before the process is killed.
Could we remove this setting, since users can set it via the
{{spark.executor.extraJavaOptions}} parameter if necessary?
Executor log:
!image-2024-10-31-14-02-10-261.png!
Driver log:
!image-2024-10-31-14-11-22-952.png!
> Remove Hardcoded OnOutOfMemoryError Setting in Executor Startup Script
> ----------------------------------------------------------------------
>
> Key: SPARK-50186
> URL: https://issues.apache.org/jira/browse/SPARK-50186
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 3.3.1, 4.0.0
> Reporter: Ruochen Zou
> Priority: Minor
> Attachments: image-2024-10-31-14-14-06-723.png,
> image-2024-10-31-14-14-31-701.png
>
>
> Currently, the Executor startup script hardcodes the addition of
> {{{}-XX:OnOutOfMemoryError='kill %p'{}}}, which causes the process to be
> killed when the Executor encounters an OOM (Out Of Memory) error.
> {code:java}
> // code in YarnSparkHadoopUtil
> private[yarn] def addOutOfMemoryErrorArgument(javaOpts: ListBuffer[String]):
> Unit = {
> if (!javaOpts.exists(_.contains("-XX:OnOutOfMemoryError"))) {
> if (Utils.isWindows) {
> javaOpts += escapeForShell("-XX:OnOutOfMemoryError=taskkill /F /PID
> %%%%p")
> } else {
> javaOpts += "-XX:OnOutOfMemoryError='kill %p'"
> }
> }
> }{code}
> As a result, the YarnAllocator receives an exit code of 143 and is unable to
> accurately determine the reason for the Executor's termination based on this
> exit code. Moreover, the CoarseGrainedExecutorBackend cannot guarantee that
> StatusUpdate messages are sent to the Driver before the process is killed.
> Could we remove this setting, since users can set it via the
> {{spark.executor.extraJavaOptions}} parameter if necessary?
> Executor log:
> !image-2024-10-31-14-14-06-723.png!
>
> Driver log:
> !image-2024-10-31-14-14-31-701.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]