s0nskar commented on PR #3601: URL: https://github.com/apache/celeborn/pull/3601#issuecomment-4038469927
Wouldn't server side approach like – https://github.com/apache/celeborn/pull/3336 makes more sense to handle this. Just thinking out loud, Few cons i can see with this approach: 1. We are not considering the existing shuffle data stored for the app on Celeborn server or multiple shuffle stages running in parallel. 2. We are removing the written bytes as soon as all mappers are completed ``` if (shuffleWriteLimitEnabled) { shuffleTotalWrittenBytes.remove(shuffleId) } ``` but the shuffle data will be stored on the server till shuffle cleanup happens. 3. No central config management, such configs should be managed by config store so it can be applied globally to all apps, instead of each app having control on such configs. (Override functionality can be provided for certain apps) Cons with server side approach – 1. Since it relies on heartbeats, for very high throughput applications the difference between threshold and actual killing can be large but for normal applications it should be fine. @SteNicholas @RexXiong wanted to know your thoughts on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
