[jira] [Comment Edited] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

L. C. Hsieh (Jira) Mon, 04 Jan 2021 22:28:36 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258692#comment-17258692
 ]


L. C. Hsieh edited comment on SPARK-33833 at 1/5/21, 6:27 AM:
--------------------------------------------------------------

Hmm, I did a few test locally. Does Burrow work only if Spark commits offset 
progress back to Kafka?

I added some code to commit offset progress to Kafka. After I checked 
"__consumer_offsets" topic of Kafka, I found that no matter Spark commits the 
progress to Kafka or not, the record of the consumer group of the Spark SS 
query is always in "__consumer_offsets".

Based on https://github.com/linkedin/Burrow/wiki, Burrow checks consumer groups 
info from this "__consumer_offsets" topic. So if either Spark commits or not, 
there will be a record about the consumer group, does it mean Burrow still 
works without Spark committing offset progress to Kafka?

If so, then Spark doesn't need any change for this ticket.




was (Author: viirya):
Hmm, I did a few test locally. Does Burrow work only if Spark commits offset 
progress back to Kafka?

I added some code to commit offset progress to Kafka. After I checked 
"__consumer_offsets" topic of Kafka, I found that no matter Spark commits the 
progress to Kafka or not, the record of the consumer group of the Spark SS 
query is always in "__consumer_offsets".

Based on https://github.com/linkedin/Burrow/wiki, Burrow checks consumer groups 
info from this "__consumer_offsets" topic. So if either Spark commits or not, 
there will be a record about the consumer group, does it mean Burrow still 
works without Spark committing offset progress to Kafka?



> Allow Spark Structured Streaming report Kafka Lag through Burrow
> ----------------------------------------------------------------
>
>                 Key: SPARK-33833
>                 URL: https://issues.apache.org/jira/browse/SPARK-33833
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.0.1
>            Reporter: Sam Davarnia
>            Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

Reply via email to