Hey Mika, Thanks for putting this FLIP together. A dedicated test harness for PTFs is a welcome addition. The builder-pattern API and the state/timer introspection features are well thought out.
I have a few questions and suggestions after reviewing the FLIP: 1. The FLIP has good examples of harness construction via the builder, but doesn't address lifecycle management. For comparison, the existing DataStream test harnesses (documented at [1]) have explicit open() and implement AutoCloseable. Since PTFs can acquire resources in open(), the harness needs to manage this lifecycle. Could you clarify how open()/close() on the underlying PTF is handled? Is close() called automatically, or does the user need to trigger it? An end-to-end example showing cleanup would help. 2. The existing operator test harnesses support snapshot() and initializeState(OperatorSubtaskState) to simulate checkpoint/restore cycles. This is important for catching state serialization bugs, which are a common source of production issues. The FLIP provides withInitialState() for setup, but there's no way to take a snapshot mid-test and restore into a fresh harness. Are we deliberately excluding this, or should we consider adding it? 3. Related to the above: PTFs with complex state (Map, List, POJO) can behave differently with heap vs. RocksDB backends due to serialization differences. The existing harnesses support setStateBackend(). Should the PTF test harness support this as well? At minimum, it would be good to document which backend is used by default. 4. withTableArgument(String tableName, List<Row> rows) is useful for testing join-like PTFs. The builder Javadoc describes when static rows are passed and how table semantics (ROW_SEMANTIC vs SET_SEMANTIC) affect delivery, but a few things remain unclear: How is the schema for these rows determined: is it inferred from the Row structure, or does it need to match the eval() signature's type hints? And what happens if a PTF reads from a table argument that hasn't been configured via the builder: does it receive null, or does the harness throw at build time? A few smaller points: - Timer.hasFired() is typed as boolean (primitive) but annotated @Nullable. This looks like a bug -> should it be Boolean (boxed)? - getOutputByKind(RowKind) implies that output preserves RowKind metadata. Could you confirm that getOutput() also retains this? The generic <OUT> type parameter could use more specification on what's guaranteed. - Have you considered optional changelog consistency validation (e.g., verifying UPDATE_BEFORE precedes UPDATE_AFTER for the same key)? Could be a useful debugging aid. - What's the error model when eval() or a timer callback throws? Propagated directly, or wrapped? - The test plan mentions leveraging ProcessTableFunctionTestPrograms. Could you clarify whether the harness will be validated against those scenarios, or whether it's intended to replace them for certain use cases? Overall I'm +1 on the direction. The core API design is clean and covers the main testing needs well. Addressing the lifecycle and checkpoint/restore gaps would bring it in line with what Flink users already have for DataStream UDF testing. Thanks, Martijn [1] https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/testing/#unit-testing-stateful-or-timely-udfs--custom-operators On Fri, Mar 6, 2026 at 9:30 AM Mika Naylor <[email protected]> wrote: > Hey David, > > Yeah, I think in terms of scope I aim for more providing a framework for > unit testing the behavior of custom PTFs. I'd like to include as much > validation as possible but there might be validation steps that aren't > possible to do without dipping into the engine side of things. > > I'm not entirely sure on the real/processing time considerations - my aim > here was mostly around letting users validate timer behaviour, and timer > registration/firing in PTFs is based on watermarks, if I read the doc > correctly. > > Kind regards, > Mika > > On Wed, 4 Mar 2026, at 10:38 AM, David Radley wrote: > > Hi Mika, > > This sounds like a good idea, in terms of scope, Is the idea that this > is purely for unit tests or is this additionally proposed as validation / > test harness for use when developing custom PTFs. > > I guess this allows us to create a common set of tests that all PTFs > need to pass using this harness. > > > > I would assume there are real (not event) time considerations for some > PTFs, it would be worth mentioning how we should handle that. > > > > Kind regards, David. > > > > From: Mika Naylor <[email protected]> > > Date: Tuesday, 3 March 2026 at 16:46 > > To: [email protected] <[email protected]> > > Subject: [EXTERNAL] [DISCUSS] FLIP-567: Introduce a ProcessTableFunction > Test Harness > > > > Hey everyone! > > > > I would like to kick off a discussion on FLIP-567: Introduce a > ProcessTableFunction Test Harness[1]. > > > > Currently, testing PTFs require full integration tests against a running > Flink cluster. This FLIP would introduce a developer-friendly test harness > for unit testing PTFs and would provide introspection to output, state, > timers, and watermarks for assertions and behaviour validation. This would > let developers iterate and test their PTFs without needing to run a > fullscale integration test against a live Flink cluster. > > > > Would love any thoughts and feedback the community might have on this > proposal. > > > > Kind regards, > > Mika Naylor > > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-567%3A+Introduce+a+ProcessTableFunction+Test+Harness > > > > Unless otherwise stated above: > > > > IBM United Kingdom Limited > > Registered in England and Wales with number 741598 > > Registered office: Building C, IBM Hursley Office, Hursley Park Road, > Winchester, Hampshire SO21 2JN > > >
