[GitHub] [spark] xuanyuanking edited a comment on pull request #31944: [SPARK-34854][SQL][SS] Expose source metrics via progress report and add Kafka use-case to report delay.

GitBox Thu, 25 Mar 2021 00:23:50 -0700


xuanyuanking edited a comment on pull request #31944:
URL: https://github.com/apache/spark/pull/31944#issuecomment-806420508



   Besides the flexibility, I can see the new API here can help with the 
expansibility for the progress reporter. If new customized metrics needed in 
the future, we don't need to change the top-level for the output JSON string 
instead of new fields in the map-like field. It also gave us the possibility to 
customize metrics for different SparkDataSource.
   
   ```
   It's definitely true that for Kafka source, this api isn't that necessary 
because of that reported latest offset.
   ```
   Yes. Maybe you can also move the metrics added in #30988 to your new 
implementation. Either way is OK for me. cc @viirya for more opinions.
   
   Agree with Gabor and Juntaek, here we need to provide more use cases for the 
new API. Besides the metrics added for Kafka source in this PR, from what I'm 
thinking, we can use this new API to expose more customized metrics for 
FileStreamSource only. E.g., the files or number of bytes outstanding.
   
   Of cause, end-users can implement this in their customized SparkDataStream. 
Maybe @yijiacui-db can provide more use cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking edited a comment on pull request #31944: [SPARK-34854][SQL][SS] Expose source metrics via progress report and add Kafka use-case to report delay.

Reply via email to