[ 
https://issues.apache.org/jira/browse/SPARK-50186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruochen Zou updated SPARK-50186:
--------------------------------
    Description: 
Currently, the Executor startup script hardcodes the addition of 
{{{}-XX:OnOutOfMemoryError='kill %p'{}}}, which causes the process to be killed 
when the Executor encounters an OOM (Out Of Memory) error.
{code:java}
// code in YarnSparkHadoopUtil
private[yarn] def addOutOfMemoryErrorArgument(javaOpts: ListBuffer[String]): 
Unit = {
  if (!javaOpts.exists(_.contains("-XX:OnOutOfMemoryError"))) {
    if (Utils.isWindows) {
      javaOpts += escapeForShell("-XX:OnOutOfMemoryError=taskkill /F /PID 
%%%%p")
    } else {
      javaOpts += "-XX:OnOutOfMemoryError='kill %p'"
    }
  }
}{code}
As a result, the YarnAllocator receives an exit code of 143 and is unable to 
accurately determine the reason for the Executor's termination based on this 
exit code. Moreover, the CoarseGrainedExecutorBackend cannot guarantee that 
StatusUpdate messages are sent to the Driver before the process is killed.
Could we remove this setting, since users can set it via the 
{{spark.executor.extraJavaOptions}} parameter if necessary?

Executor log:
!image-2024-10-31-14-14-06-723.png!
 
Driver log:
!image-2024-10-31-14-17-51-349.png!

  was:
Currently, the Executor startup script hardcodes the addition of 
{{{}-XX:OnOutOfMemoryError='kill %p'{}}}, which causes the process to be killed 
when the Executor encounters an OOM (Out Of Memory) error.
{code:java}
// code in YarnSparkHadoopUtil
private[yarn] def addOutOfMemoryErrorArgument(javaOpts: ListBuffer[String]): 
Unit = {
  if (!javaOpts.exists(_.contains("-XX:OnOutOfMemoryError"))) {
    if (Utils.isWindows) {
      javaOpts += escapeForShell("-XX:OnOutOfMemoryError=taskkill /F /PID 
%%%%p")
    } else {
      javaOpts += "-XX:OnOutOfMemoryError='kill %p'"
    }
  }
}{code}
As a result, the YarnAllocator receives an exit code of 143 and is unable to 
accurately determine the reason for the Executor's termination based on this 
exit code. Moreover, the CoarseGrainedExecutorBackend cannot guarantee that 
StatusUpdate messages are sent to the Driver before the process is killed.
Could we remove this setting, since users can set it via the 
{{spark.executor.extraJavaOptions}} parameter if necessary?

Executor log:
!image-2024-10-31-14-14-06-723.png!
 
Driver log:
!image-2024-10-31-14-14-31-701.png!


> Remove Hardcoded OnOutOfMemoryError Setting in Executor Startup Script
> ----------------------------------------------------------------------
>
>                 Key: SPARK-50186
>                 URL: https://issues.apache.org/jira/browse/SPARK-50186
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.3.1, 4.0.0
>            Reporter: Ruochen Zou
>            Priority: Minor
>         Attachments: image-2024-10-31-14-14-06-723.png, 
> image-2024-10-31-14-14-31-701.png, image-2024-10-31-14-17-51-349.png
>
>
> Currently, the Executor startup script hardcodes the addition of 
> {{{}-XX:OnOutOfMemoryError='kill %p'{}}}, which causes the process to be 
> killed when the Executor encounters an OOM (Out Of Memory) error.
> {code:java}
> // code in YarnSparkHadoopUtil
> private[yarn] def addOutOfMemoryErrorArgument(javaOpts: ListBuffer[String]): 
> Unit = {
>   if (!javaOpts.exists(_.contains("-XX:OnOutOfMemoryError"))) {
>     if (Utils.isWindows) {
>       javaOpts += escapeForShell("-XX:OnOutOfMemoryError=taskkill /F /PID 
> %%%%p")
>     } else {
>       javaOpts += "-XX:OnOutOfMemoryError='kill %p'"
>     }
>   }
> }{code}
> As a result, the YarnAllocator receives an exit code of 143 and is unable to 
> accurately determine the reason for the Executor's termination based on this 
> exit code. Moreover, the CoarseGrainedExecutorBackend cannot guarantee that 
> StatusUpdate messages are sent to the Driver before the process is killed.
> Could we remove this setting, since users can set it via the 
> {{spark.executor.extraJavaOptions}} parameter if necessary?
> Executor log:
> !image-2024-10-31-14-14-06-723.png!
>  
> Driver log:
> !image-2024-10-31-14-17-51-349.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to