Github user kunalkhamar commented on a diff in the pull request: https://github.com/apache/spark/pull/17216#discussion_r106285230 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -380,7 +387,27 @@ class StreamExecution( logInfo(s"Resuming streaming query, starting with batch $batchId") currentBatchId = batchId availableOffsets = nextOffsets.toStreamProgress(sources) - offsetSeqMetadata = nextOffsets.metadata.getOrElse(OffsetSeqMetadata()) + + // initialize metadata + val shufflePartitionsSparkSession: Int = sparkSession.conf.get(SQLConf.SHUFFLE_PARTITIONS) + offsetSeqMetadata = { + if (nextOffsets.metadata.isEmpty) { + OffsetSeqMetadata(0, 0, + Map(SQLConf.SHUFFLE_PARTITIONS.key -> shufflePartitionsSparkSession.toString)) + } else { + val metadata = nextOffsets.metadata.get + val shufflePartitionsToUse = metadata.conf.getOrElse(SQLConf.SHUFFLE_PARTITIONS.key, { + // For backward compatibility, if # partitions was not recorded in the offset log, + // then ensure it is not missing. The new value is picked up from the conf. + logDebug("Number of shuffle partitions from previous run not found in checkpoint. " --- End diff -- Changed to log warning. Rechecked the semantics, it works as expected and warning only printed at time of first upgrade. Once we restart query from a v2.1 checkpoint and then stop it, any new offsets written out will contain num shuffle partitions. Any future restarts will read these new offsets in `StreamExecution.populateStartOffsets->offsetLog.getLatest` and pick up the recorded num shuffle partitions. Useful to note for future reference that we do not change the old offset files to contain num shuffle partitions, the semantics are correct because of call to `offsetLog.getLatest`.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org