Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/17219#discussion_r106798639 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -419,14 +441,44 @@ class StreamExecution( SQLConf.SHUFFLE_PARTITIONS.key, shufflePartitionsToUse.toString) } - logDebug(s"Found possibly unprocessed offsets $availableOffsets " + - s"at batch timestamp ${offsetSeqMetadata.batchTimestampMs}") + offsetLog.get(latestBatchId - 1).foreach { lastOffsets => + committedOffsets = lastOffsets.toStreamProgress(sources) + } - offsetLog.get(batchId - 1).foreach { - case lastOffsets => - committedOffsets = lastOffsets.toStreamProgress(sources) - logDebug(s"Resuming with committed offsets: $committedOffsets") + /* identify the current batch id: if commit log indicates we successfully processed the + * latest batch id in the offset log, then we can safely move to the next batch + * i.e., committedBatchId + 1 + */ + batchCommitLog.getLatest() match { + case Some((completionBatchId, _)) + if latestBatchId == completionBatchId => { + /* The last batch was successfully committed, so we can safely process a + * new next batch but first: + * Make a call to getBatch using the offsets from previous batch. + * because certain sources (e.g., KafkaSource) assume on restart the last + * batch will be executed before getOffset is called again. + */ + availableOffsets.foreach { + case (source, end) + if committedOffsets.get(source).map(_ != end).getOrElse(true) => + val start = committedOffsets.get(source) + logDebug(s"Initializing offset retrieval from $source " + --- End diff -- incorrect. you are not doing offset retrieval. rather say something like getting latest batch from sources but not executing.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org