yijiacui-db edited a comment on pull request #31944:
URL: https://github.com/apache/spark/pull/31944#issuecomment-828832285


   > > > I've tested it on real cluster and works fine.
   > > > Just a question. How this it intended to use for dynamic allocation?
   > > 
   > > 
   > > Users can implement this interface in their customized SparkDataStream 
and know how far falling behind through the progress listener. Maybe this can 
provide more useful information to guide/trigger the auto scaling.
   > 
   > This is a valid user-case. But my question is that current offsets in 
`SourceProgress` should already provide the information the use-case needs 
(consumed offset, available offset). The source progress should be also 
available on the customized SparkDataStream. Do you mean the metrics from the 
customized SparkDataStream is not offset related?
   
   Yes. Available offset is retrieved through reportLatestOffset, that's 
something Kafka already implemented, so that's duplicated because we can use 
the latest consumed offset and also the available offset to compute how far is 
falling behind.
   But, for other customized spark data stream, it's possible that 
reportLatestOffset isn't implemented, so from the source progress report, 
there's no way to know the latest available offset to do the computation.  
Also, the customized metrics, for example, how far the application is falling 
behind from the latest, can be represented in other ways (not only in the 
number of offset), which all depends on the how the stream defines it.
   
   We want to introduce this metrics interface to let user implement for their 
data stream to obtain the metrics they want from the source progress report. 
Kafka Stream is just an example of how users can implement this and retrieve 
that information, but it happens to have the latest available offset to make it 
look a little bit duplicated and hard to reason about.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to