[GitHub] spark pull request #17216: [SPARK-19873][SS] Record num shuffle partitions i...

kunalkhamar Wed, 15 Mar 2017 14:16:48 -0700

Github user kunalkhamar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17216#discussion_r106285230
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
    @@ -380,7 +387,27 @@ class StreamExecution(
             logInfo(s"Resuming streaming query, starting with batch $batchId")
             currentBatchId = batchId
             availableOffsets = nextOffsets.toStreamProgress(sources)
    -        offsetSeqMetadata = 
nextOffsets.metadata.getOrElse(OffsetSeqMetadata())
    +
    +        // initialize metadata
    +        val shufflePartitionsSparkSession: Int = 
sparkSession.conf.get(SQLConf.SHUFFLE_PARTITIONS)
    +        offsetSeqMetadata = {
    +          if (nextOffsets.metadata.isEmpty) {
    +            OffsetSeqMetadata(0, 0,
    +              Map(SQLConf.SHUFFLE_PARTITIONS.key -> 
shufflePartitionsSparkSession.toString))
    +          } else {
    +            val metadata = nextOffsets.metadata.get
    +            val shufflePartitionsToUse = 
metadata.conf.getOrElse(SQLConf.SHUFFLE_PARTITIONS.key, {
    +              // For backward compatibility, if # partitions was not 
recorded in the offset log,
    +              // then ensure it is not missing. The new value is picked up 
from the conf.
    +              logDebug("Number of shuffle partitions from previous run not 
found in checkpoint. "
    --- End diff --
    
    Changed to log warning.
    Rechecked the semantics, it works as expected and warning only printed at 
time of first upgrade.
    Once we restart query from a v2.1 checkpoint and then stop it, any new 
offsets written out will contain num shuffle partitions. Any future restarts 
will read these new offsets in 
`StreamExecution.populateStartOffsets->offsetLog.getLatest` and pick up the 
recorded num shuffle partitions.
    Useful to note for future reference that we do not change the old offset 
files to contain num shuffle partitions, the semantics are correct because of 
call to `offsetLog.getLatest`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17216: [SPARK-19873][SS] Record num shuffle partitions i...

Reply via email to