xuanyuanking edited a comment on pull request #31944: URL: https://github.com/apache/spark/pull/31944#issuecomment-806420508
Besides the flexibility, I can see the new API here can help with the expansibility for the progress reporter. If new customized metrics needed in the future, we don't need to change the top-level for the output JSON string instead of new fields in the map-like field. It also gave us the possibility to customize metrics for different SparkDataSource. ``` It's definitely true that for Kafka source, this api isn't that necessary because of that reported latest offset. ``` Yes. Maybe you can also move the metrics added in #30988 to your new implementation. Either way is OK for me. cc @viirya for more opinions. Agree with Gabor and Juntaek, here we need to provide more use cases for the new API. Besides the metrics added for Kafka source in this PR, from what I'm thinking, we can use this new API to expose more customized metrics for FileStreamSource only. E.g., the files or number of bytes outstanding. Of cause, end-users can implement this in their customized SparkDataStream. Maybe @yijiacui-db can provide more use cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org