Apache Wayang Execution Modes & Failure Recovery - Clarification Needed

Mirko Kaempf, Dr. Tue, 30 Dec 2025 08:30:20 -0800

*Hi team,*

I'm working on understanding Apache Wayang's execution semantics,
particularly around job execution modes and failure recovery. I've
identified some questions where I need clarity on what's already planned
vs. what still needs definition.

*Core Question:*

When we execute an Apache Wayang plan, is it:
- A *context-bound operation* (like a Spark batch job that processes files
present at planning time)
- A *continuously running operation* (like a streaming job processing
real-time data
*- Something that can be both*, depending on configuration?

I think that this distinction matters significantly for:
- Which data gets included in processing
- How failures should be handled
- What "job completion" means
- State management requirements, if such should be considered
*Specific Scenarios I'm Trying to Understand*

*Scenario 1: Batch Processing*

- Spark-style job processes files available at job planning time
- New data arriving later requires re-execution
- Clear start and end points

*Scenario 2: Real-time Streaming*

- Continuous data flow with "here and now" validity
- Job potentially runs indefinitely
- Requires state management for pause/resume

*Scenario 3: Historical Streaming*

- Stream-processing existing closed datasets (e.g., data partitions,
time range)
- Bounded data, but streaming semantics
- How does this fit?

*What I Want to Understand Next:*

*A. *
*Current State & Existing Plans:*1 - Is there already a classification
system for Wayang job types (batch/streaming/hybrid)?
2 - What failure recovery mechanisms are currently implemented?
3 - How does Wayang currently handle state management across different
platforms (Flink vs. Spark)?
4 - Is there documentation on execution mode semantics that I've missed?

*B. Failure Recovery Semantics:*
- When a job fails, what determines whether it should:
(a) Resume from the failure point (requires state)
(b) Restart from the beginning
- How are "failure point" and "start position" currently defined?

*C. Job Lifecycle:*
- How is "job completion" currently defined for different execution
patterns?
- For streaming jobs, what triggers termination?
- How do we distinguish between "still running as designed" vs. "needs
intervention"?

If these semantics aren't fully defined yet, I suggest that we draft and
discuss such a classification framework.

*Why This Matters*

Without clear execution mode semantics, it's difficult to:

- Design proper monitoring and alerting
- Implement reliable failure recovery
- Communicate expectations to users
- Choose appropriate platform backends

I'm happy to contribute to defining this if it's not yet formalized, but
first I need to understand what's already in place or planned.

*Looking forward to your insights!*

Best regards,
Mirko

Dr. Mirko Kämpf
*Co-Founder - Scalytics Inc.*

*Why Scalytics:*
Break data silos. Put your data to work. All without ever exposing your
data. *That's the Scalytics difference.*

--
3401 N. MIAMI AVE. STE 230
<https://www.google.com/maps/search/3401+N.+MIAMI+AVE.+STE+230+33127+Miami,+Florida++United+States?entry=gmail&source=g>
33127 Miami, Florida
<https://www.google.com/maps/search/3401+N.+MIAMI+AVE.+STE+230+33127+Miami,+Florida++United+States?entry=gmail&source=g>
United States
<https://www.google.com/maps/search/3401+N.+MIAMI+AVE.+STE+230+33127+Miami,+Florida++United+States?entry=gmail&source=g>
www.scalytics.io

-- Please consider the environment before printing this email --

Disclaimer:
The content of this message is confidential. If you have received it by
mistake, please inform us by an email reply and then delete the message. It
is forbidden to copy, forward, or in any way reveal the contents of this
message to anyone. The integrity and security of this email cannot be
guaranteed over the Internet. Therefore, the sender will not be held liable
for any damage caused by the message.

--
*Scalytics Connect*
The foundation for secure, scalable, and transparent
AI.
www.scalytics.io <http://www.scalytics.io>

-- Please consider the
environment before printing this email --

Disclaimer:
The content of this
message is confidential. If you have received it by mistake, please inform
us by an email reply and then delete the message. It is forbidden to copy,
forward, or in any way reveal the contents of this message to anyone. The
integrity and security of this email cannot be guaranteed over the
Internet. Therefore, the sender will not be held liable for any damage
caused by the message.

Apache Wayang Execution Modes & Failure Recovery - Clarification Needed

Reply via email to