Boyang Jerry Peng created SPARK-52330:
-----------------------------------------

             Summary: SPIP: Real-Time Mode in Apache Spark Structured Streaming
                 Key: SPARK-52330
                 URL: https://issues.apache.org/jira/browse/SPARK-52330
             Project: Spark
          Issue Type: Umbrella
          Components: Structured Streaming
    Affects Versions: 4.1.0
            Reporter: Boyang Jerry Peng


We propose to add a *real-time mode* in Spark Structured Streaming that 
significantly lowers end-to-end latency for processing streams of data. Our 
goal is to make Spark capable of handling streaming jobs that need results 
*almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want to 
achieve this *without changing the high-level DataFrame/Dataset API* that users 
already use – so existing streaming queries can run in this new 
ultra-low-latency mode by simply turning it on, without rewriting their logic. 
In short, we’re trying to enable Spark to power *real-time applications* (like 
instant anomaly alerts or live personalization) that today cannot meet their 
latency requirements with Spark’s current streaming engine.

 

SPIP doc: 
https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to