[jira] [Commented] (SPARK-36446) YARN shuffle server restart crashes all dynamic allocation jobs that have deallocated an executor

2021-08-31 Thread Adam Kennedy (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407714#comment-17407714 ] Adam Kennedy commented on SPARK-36446: -- [~tgraves] Yes, we are running with recovery enabled (in

[jira] [Commented] (SPARK-36446) YARN shuffle server restart crashes all dynamic allocation jobs that have deallocated an executor

2021-08-11 Thread Thomas Graves (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397461#comment-17397461 ] Thomas Graves commented on SPARK-36446: --- [~adamkennedy77] ^ > YARN shuffle server restart crashes

[jira] [Commented] (SPARK-36446) YARN shuffle server restart crashes all dynamic allocation jobs that have deallocated an executor

2021-08-06 Thread Thomas Graves (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394931#comment-17394931 ] Thomas Graves commented on SPARK-36446: --- Is this with the yarn nodemangar recovery enabled?  ie

[jira] [Commented] (SPARK-36446) YARN shuffle server restart crashes all dynamic allocation jobs that have deallocated an executor

2021-08-06 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394915#comment-17394915 ] Dongjoon Hyun commented on SPARK-36446: --- cc [~tgraves] and [~mridul] > YARN shuffle server

[jira] [Commented] (SPARK-36446) YARN shuffle server restart crashes all dynamic allocation jobs that have deallocated an executor

2021-08-06 Thread Adam Kennedy (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394905#comment-17394905 ] Adam Kennedy commented on SPARK-36446: -- The problem was particularly amplified by the Executor

[jira] [Commented] (SPARK-36446) YARN shuffle server restart crashes all dynamic allocation jobs that have deallocated an executor

2021-08-06 Thread Adam Kennedy (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394873#comment-17394873 ] Adam Kennedy commented on SPARK-36446: -- Note: While I haven't investigated any other shuffle