Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17219#discussion_r106798639
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
    @@ -419,14 +441,44 @@ class StreamExecution(
                 SQLConf.SHUFFLE_PARTITIONS.key, 
shufflePartitionsToUse.toString)
             }
     
    -        logDebug(s"Found possibly unprocessed offsets $availableOffsets " +
    -          s"at batch timestamp ${offsetSeqMetadata.batchTimestampMs}")
    +        offsetLog.get(latestBatchId - 1).foreach { lastOffsets =>
    +          committedOffsets = lastOffsets.toStreamProgress(sources)
    +        }
     
    -        offsetLog.get(batchId - 1).foreach {
    -          case lastOffsets =>
    -            committedOffsets = lastOffsets.toStreamProgress(sources)
    -            logDebug(s"Resuming with committed offsets: $committedOffsets")
    +        /* identify the current batch id: if commit log indicates we 
successfully processed the
    +         * latest batch id in the offset log, then we can safely move to 
the next batch
    +         * i.e., committedBatchId + 1
    +         */
    +        batchCommitLog.getLatest() match {
    +          case Some((completionBatchId, _))
    +            if latestBatchId == completionBatchId => {
    +            /* The last batch was successfully committed, so we can safely 
process a
    +             * new next batch but first:
    +             * Make a call to getBatch using the offsets from previous 
batch.
    +             * because certain sources (e.g., KafkaSource) assume on 
restart the last
    +             * batch will be executed before getOffset is called again.
    +             */
    +            availableOffsets.foreach {
    +              case (source, end)
    +                if committedOffsets.get(source).map(_ != 
end).getOrElse(true) =>
    +                val start = committedOffsets.get(source)
    +                logDebug(s"Initializing offset retrieval from $source " +
    --- End diff --
    
    incorrect. you are not doing offset retrieval. rather say something like 
getting latest batch from sources but not executing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to