Hi Ron, Thank you for the positive feedback and the insightful questions. Here are the clarifications regarding the scope and interaction of these clauses:
1. Applicability to FULL (Batch) vs. CONTINUOUS Modes You are correct that there is a distinction here. - STATE_RETENTION: This clause is not applicable in FULL mode. Since batch executions are inherently stateless, there is no state to retain or drop. - START_MODE: This clause is technically applicable to FULL mode as well. In that context, it would effectively act as a filter (e.g., FROM_TIMESTAMP), limiting the scope of the data processed in the batch run. 2. Interaction between Source Offsets and State Retention This is a great point regarding the definition of "state." - State: If STATE_RETENTION = NONE Flink discards the operator state (aggregates, joins). - Offsets: Offsets are special as they may not necessarily be part of the Flink state. They could be stored in other systems, such as the source system (e.g., Kafka consumer groups) or an external database. To simplify the mental model, we propose to decouple offset management from state management. If a user wants to RESUME, the system tries to use what is available to it to fulfill the desired behavior, even if that information is technically coming from the state that is being discarded (e.g., by extracting the offset before dropping the checkpoint). Best, Ramin On Thu, Dec 25, 2025 at 1:41 PM Ron Liu <[email protected]> wrote: > Hi, Ramin > > Thank you very much for initiating this FLIP. The proposed extension to the > existing syntax is highly valuable and will significantly improve the user > experience. > > After carefully reviewing the FLIP, I have two questions: > > 1. Are the START_MODE and STATE_RETENTION clauses applicable only to > CONTINUOUS mode? In FULL mode, since the job is stateless, do these clauses > take effect at all? > > 2. Regarding the STATE_RETENTION clause: from your current design, it > appears to control whether the state from operators *other than* the > SourceOperator is retained. However, the source offset itself is also part > of the job’s state. From a state management perspective, do START_MODE and > STATE_RETENTION exhibit some functional overlap or ambiguity? For example, > if STATE_RETENTION = NONE it means discard *all* state—including the source > offset—then when START_MODE = RESUME_OR_FROM_NOW, would it still be > possible to resume from the corresponding source offset? > > Best, > > Ron > > Timo Walther <[email protected]> 于2025年12月18日周四 19:46写道: > > > Hi Ramin, > > > > thank you for proposing this FLIP. It nicely extends the existing > > semantics and improves the experience esp. for the CONTINUOUS mode. But > > even in FULL mode, people don't want to pay for full processing and > > would like to have more control over what will be reprocessed in case of > > a query evolution. Sometimes also to skip malformed data. > > > > I was heavily involved in the FLIP discussions and it took us quite a > > few iterations to end up with this concise syntax that nicely fits into > > Flink semantics of savepoints and time travelling (FLIP-308), while > > keeping new developments such as PTFs (FLIP-440) in the picture as well. > > > > +1 to this proposal. > > > > Cheers, > > Timo > > > > > > On 01.12.25 18:23, Ramin Gharib wrote: > > > Hi everyone, > > > > > > I would like to start a discussion on FLIP-557: Granular Control over > > Data > > > Reprocessing and State Retention in Materialized Table Evolution [1]. > > > > > > Currently, ALTER MATERIALIZED TABLE forces a full job restart and > > discards > > > state, which is inefficient for many evolution scenarios. FLIP-557 > > proposes > > > decoupling data scope from state management by introducing two new > > optional > > > clauses: > > > 1. START_MODE*:* Controls the data processing window (e.g., > > FROM_BEGINNING, > > > RESUME_OR_...). > > > > > > 2. STATE_RETENTION*:* Controls how existing state is handled (e.g., > NONE, > > > PTF_ONLY). > > > > > > This gives users explicit control over cost and correctness during > table > > > evolution. > > > > > > For more details, please refer to the FLIP [1]. > > > > > > Looking forward to your feedback and thoughts! > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-557%3A+Granular+Control+over+Data+Reprocessing+and+State+Retention+in+Materialized+Table+Evolution > > > > > > Best regards, > > > > > > Ramin Gharib > > > > > > > >
