Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20386 There is a lesson I learned from streaming data source v1: even it's totally internal, there are people already using it and ask us to not remove the API. I think it's also true for the file-based data source. It's internal but people may still use it. Although we don't find any use case for `onTaskCommit` among built-in data sources, it may be required by external data sources. One possible use case might be, the implementation needs a 2-phase commit at the driver side. Then it can use `onTaskCommit` to finish the first phase earlier. Or maybe someone wanna collect the received commit messages so far and report statistics regularly, then he needs the `onTaskCommit`.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org