[ 
https://issues.apache.org/jira/browse/SPARK-24697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-24697.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 21744
[https://github.com/apache/spark/pull/21744]

> Fix the reported start offsets in streaming query progress
> ----------------------------------------------------------
>
>                 Key: SPARK-24697
>                 URL: https://issues.apache.org/jira/browse/SPARK-24697
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Arun Mahadevan
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Streaming query reports progress during each trigger (e.g. after runBatch in 
> MicrobatchExcecution). However the reported progress has wrong offsets since 
> the offsets are committed and committedOffsets is updated to the 
> availableOffsets before the progress is reported.
> This leads to weird progress where startOffset and endOffsets are always the 
> same.
> Sample output for Kafka source below. Here 11 rows are processed in the 
> microbatch however the start and end offsets are same.
>  
> {code:java}
> {
>  "id" : "76bf5515-55be-46af-bc79-9fc92cc6d856",
>  "runId" : "b526f0f4-24bf-4ddc-b6e8-7b0cc83bdbe8",
> ...
> "sources" : [ {
>  "description" : "KafkaV2[Subscribe[topic2]]",
>  "startOffset" : {
>  "topic2" : {
>  "0" : 44
>  }
>  },
>  "endOffset" : {
>  "topic2" : {
>  "0" : 44
>  }
>  },
>  "numInputRows" : 11,
>  "inputRowsPerSecond" : 1.099670098970309,
>  "processedRowsPerSecond" : 1.8829168093118795
>  } ],
> ...
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to