[GitHub] [spark] warrenzhu25 commented on pull request #38233: [SPARK-40781][CORE] Explain exit code 137 as killed due to OOM

2022-10-28 Thread GitBox


warrenzhu25 commented on PR #38233:
URL: https://github.com/apache/spark/pull/38233#issuecomment-1295617623

   > Can you point me to the reference where yarn mode is using `137` for OOM ? 
IIRC we simply kill the executor process in case there is an OOM - which 
usually results in `143` as the return code ...
   > 
   > ```
   > public class Test {
   > 
   >   public static void main(String[] args) {
   > long[] arr = new long[1024 * 1024 * 999];
   >   }
   > }
   > 
   > $ java -XX:OnOutOfMemoryError='kill %p'  -Xmx12m -cp . Test ; echo Error 
code: $?
   > ```
   > 
   > This is coming from `128 + SIGTERM` - and sigterm is 15 in linux. See here 
for more [1]
   > 
   > [1] https://tldp.org/LDP/abs/html/exitcodes.html
   
   My main goal is making exit code more readable in standalone, but this code 
is shared by both standalone and yarn. I see yarn exit code 137 from 
https://aws.amazon.com/premiumsupport/knowledge-center/container-killed-on-request-137-emr/#:~:text=When%20a%20container%20(Spark%20executor,in%20narrow%20and%20wide%20transformations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] warrenzhu25 commented on pull request #38233: [SPARK-40781][CORE] Explain exit code 137 as killed due to OOM

2022-10-13 Thread GitBox


warrenzhu25 commented on PR #38233:
URL: https://github.com/apache/spark/pull/38233#issuecomment-1278080998

   > possible container OOM
   
   @dongjoon-hyun This is used in Yarn and Standalone when creating 
`ExecutorExited` as loss reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] warrenzhu25 commented on pull request #38233: [SPARK-40781][CORE] Explain exit code 137 as killed due to OOM

2022-10-13 Thread GitBox


warrenzhu25 commented on PR #38233:
URL: https://github.com/apache/spark/pull/38233#issuecomment-1277834018

   > Is that true in general or YARN-specific?
   
   It's true for Linux.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org