thinkharderdev commented on issue #650: URL: https://github.com/apache/arrow-ballista/issues/650#issuecomment-1422388406
> We can just leverage the current DataFusion Metrics system and TaskStatus update rpc and add necessary throttling/checking/aborting logic when we handle the Task finish event in the Ballista Scheduler. This would be good first step, but I don't think it really solves the issue. We would need to wait for tasks to finish which means if we had really long running tasks running concurrently it could easily overload the system with no way to cancel since all tasks are taking a long time to complete. Agree that Spark's accumulator causes issues but I think if we can design something more purpose-built (and used only internally by the scheduler, not something exposed through the public API) then it could be relatively lightweight. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
