mingmwang commented on issue #650: URL: https://github.com/apache/arrow-ballista/issues/650#issuecomment-1422131979
If we just want cancel tasks early, protect systems from heavy queries/heavy scans, I think we do not need to introduce the Accumulator. Spark's Accumulator and Metrics system is very heavy and cause lots of memory issues and management burden to the Spark Driver, the Accumulator need to be registered in the Spark Driver and life cycle is managed by the Spark Driver. We can just leverage the current DataFusion Metrics system and TaskStatus update rpc and add necessary throttling/checking/aborting logic when we handle the Task finish event in the Ballista Scheduler. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
