Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

Mich Talebzadeh Wed, 28 May 2025 03:12:18 -0700

The current limitations in SSS come from micro-batching.If you are going to
reduce micro-batching, this reduction must be balanced against the
available processing capacity of the cluster to prevent back pressure and
instability. In the case of Continuous Processing mode, a specific
continuous trigger with a desired checkpoint interval quote


"
df.writeStream
   .format("...")
   .option("...")
   .trigger(Trigger.RealTime(“300 Seconds”))    // new trigger type to
enable real-time Mode
   .start()
This Trigger.RealTime signals that the query should run in the new ultra
low-latency execution mode.  A time interval can also be specified, e.g.
“300 Seconds”, to indicate how long each micro-batch should run for.
"

will inevitably depend on many factors. Not that simple
HTH


Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Wed, 28 May 2025 at 05:13, Jerry Peng <[email protected]>
wrote:

> Hi all,
>
> I want to start a discussion thread for the SPIP titled “Real-Time Mode in
> Apache Spark Structured Streaming” that I've been working on with Siying
> Dong, Indrajit Roy, Chao Sun, Jungtaek Lim, and Michael Armbrust: [JIRA
> <https://issues.apache.org/jira/browse/SPARK-52330>] [Doc
> <https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing>
> ].
>
> The SPIP proposes a new execution mode called “Real-time Mode” in Spark
> Structured Streaming that significantly lowers end-to-end latency for
> processing streams of data.
>
> A key principle of this proposal is compatibility. Our goal is to make
> Spark capable of handling streaming jobs that need results almost
> immediately (within O(100) milliseconds). We want to achieve this without
> changing the high-level DataFrame/Dataset API that users already use – so
> existing streaming queries can run in this new ultra-low-latency mode by
> simply turning it on, without rewriting their logic.
>
> In short, we’re trying to enable Spark to power real-time applications
> (like instant anomaly alerts or live personalization) that today cannot
> meet their latency requirements with Spark’s current streaming engine.
>
> We'd greatly appreciate your feedback, thoughts, and suggestions on this
> approach!
>
>

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

Reply via email to