Boyang Jerry Peng created SPARK-53784:
-----------------------------------------
Summary: Additional Source APIs needed to support RTM execution
Key: SPARK-53784
URL: https://issues.apache.org/jira/browse/SPARK-53784
Project: Spark
Issue Type: Story
Components: Structured Streaming
Affects Versions: 4.1.0
Reporter: Boyang Jerry Peng
Currently in Structured Streaming, start and end offsets are determined at the
driver prior to running the micro-batch. In real-time mode, end offsets are
not known apriori. They
are communicated to the driver later by the executors at the end of microbatch
that runs for a fixed amount of time. Thus, we need to add additional APIs in
the source to support this kind of behavior.
The lifecycle of the new API is the following
# prepareForRealTimeMode
**
Called during logical planning to inform the source if it's in real time mode
# planInputPartitions
** The driver plans partitions via planPartitions but only a starting offset
is provided (Compared to existing execution modes that require planPartitions
to provide both a starting and end offset)
#
mergeOffsets
##
Merge partitioned offsets coming from partitions/tasks to a single global
offset.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]