mridulm commented on PR #51437: URL: https://github.com/apache/spark/pull/51437#issuecomment-3080225671
I would suggest updating the listener bus queue config to be able to handle the backlog to prevent event drops, as well as look at driver gc behavior to see if there is high gc pause. You could have ended up with the reverse scenario as well - where the task end/stage end got dropped - and so executors were never released by dynamic allocation. We should be careful modifying scheduler to work around config issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
