Boyang Jerry Peng created SPARK-53784:
-----------------------------------------

             Summary: Additional Source APIs needed to support RTM execution
                 Key: SPARK-53784
                 URL: https://issues.apache.org/jira/browse/SPARK-53784
             Project: Spark
          Issue Type: Story
          Components: Structured Streaming
    Affects Versions: 4.1.0
            Reporter: Boyang Jerry Peng


Currently in Structured Streaming, start and end offsets are determined at the 
driver prior to running the micro-batch.  In real-time mode, end offsets are 
not known apriori. They
are communicated to the driver later by the executors at the end of microbatch 
that runs for a fixed amount of time.  Thus, we need to add additional APIs in 
the source to support this kind of behavior.

The lifecycle of the new API is the following
 # prepareForRealTimeMode
 ** 
Called during logical planning to inform the source if it's in real time mode
 # planInputPartitions
 ** The driver plans partitions via planPartitions but only a starting offset 
is provided (Compared to existing execution modes that require planPartitions 
to provide both a starting and end offset)
 # 
mergeOffsets
 ## 
Merge partitioned offsets coming from partitions/tasks to a single global 
offset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to