GitHub user aglinxinyuan edited a discussion: Support Batch Execution Mode

I would like to introduce the idea of supporting multiple runtime execution 
modes that users can choose from based on the requirements of their use case 
and the characteristics of their jobs.

The current (and default) execution behavior of our engine is what we call 
pipelined, or STREAMING, execution mode. In this mode, each operator performs 
continuous, incremental processing as data flows through the pipeline.

In addition, we plan to support a batch-style execution mode, referred to as 
BATCH execution mode. This mode executes jobs in a manner more reminiscent of 
traditional batch processing. We intend to enable this mode via a configuration 
flag.

Our unified approach to stream and batch processing ensures that applications 
executed over bounded inputs will produce the same final results regardless of 
the selected execution mode. Enabling BATCH execution allows the engine to 
apply additional optimizations that are only possible when operators know that 
their inputs are bounded. 

Below, I will provide an example illustrating the differences between these 
execution modes:

![Streaming](https://github.com/user-attachments/assets/e3dbb111-96f1-4fbb-866a-592b948cd577)
![Batch](https://github.com/user-attachments/assets/fb79039f-76fb-4dea-874c-e47d0fe43182)


GitHub link: https://github.com/apache/texera/discussions/4149

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]

Reply via email to