Re: [DISCUSS] FLIP-567: Introduce a ProcessTableFunction Test Harness

Martijn Visser Wed, 11 Mar 2026 10:03:12 -0700

Hey Mika,

Thanks for putting this FLIP together. A dedicated test harness for PTFs is
a welcome addition. The builder-pattern API and the state/timer
introspection features are well thought out.

I have a few questions and suggestions after reviewing the FLIP:

1. The FLIP has good examples of harness construction via the builder, but
doesn't address lifecycle management. For comparison, the existing
DataStream test harnesses (documented at [1]) have explicit open() and
implement AutoCloseable. Since PTFs can acquire resources in open(), the
harness needs to manage this lifecycle. Could you clarify how
open()/close() on the underlying PTF is handled? Is close() called
automatically, or does the user need to trigger it? An end-to-end example
showing cleanup would help.

2. The existing operator test harnesses support snapshot() and
initializeState(OperatorSubtaskState) to simulate checkpoint/restore
cycles. This is important for catching state serialization bugs, which are
a common source of production issues. The FLIP provides withInitialState()
for setup, but there's no way to take a snapshot mid-test and restore into
a fresh harness. Are we deliberately excluding this, or should we consider
adding it?

3. Related to the above: PTFs with complex state (Map, List, POJO) can
behave differently with heap vs. RocksDB backends due to serialization
differences. The existing harnesses support setStateBackend(). Should the
PTF test harness support this as well? At minimum, it would be good to
document which backend is used by default.

4. withTableArgument(String tableName, List<Row> rows) is useful for
testing join-like PTFs. The builder Javadoc describes when static rows are
passed and how table semantics (ROW_SEMANTIC vs SET_SEMANTIC) affect
delivery, but a few things remain unclear: How is the schema for these rows
determined: is it inferred from the Row structure, or does it need to match
the eval() signature's type hints? And what happens if a PTF reads from a
table argument that hasn't been configured via the builder: does it receive
null, or does the harness throw at build time?

A few smaller points:

- Timer.hasFired() is typed as boolean (primitive) but annotated @Nullable.
This looks like a bug -> should it be Boolean (boxed)?
- getOutputByKind(RowKind) implies that output preserves RowKind metadata.
Could you confirm that getOutput() also retains this? The generic <OUT>
type parameter could use more specification on what's guaranteed.
- Have you considered optional changelog consistency validation (e.g.,
verifying UPDATE_BEFORE precedes UPDATE_AFTER for the same key)? Could be a
useful debugging aid.
- What's the error model when eval() or a timer callback throws? Propagated
directly, or wrapped?
- The test plan mentions leveraging ProcessTableFunctionTestPrograms. Could
you clarify whether the harness will be validated against those scenarios,
or whether it's intended to replace them for certain use cases?

Overall I'm +1 on the direction. The core API design is clean and covers
the main testing needs well. Addressing the lifecycle and
checkpoint/restore gaps would bring it in line with what Flink users
already have for DataStream UDF testing.

Thanks,

Martijn

[1]
https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/testing/#unit-testing-stateful-or-timely-udfs--custom-operators

On Fri, Mar 6, 2026 at 9:30 AM Mika Naylor <[email protected]> wrote:

> Hey David,
>
> Yeah, I think in terms of scope I aim for more providing a framework for
> unit testing the behavior of custom PTFs. I'd like to include as much
> validation as possible but there might be validation steps that aren't
> possible to do without dipping into the engine side of things.
>
> I'm not entirely sure on the real/processing time considerations - my aim
> here was mostly around letting users validate timer behaviour, and timer
> registration/firing in PTFs is based on watermarks, if I read the doc
> correctly.
>
> Kind regards,
> Mika
>
> On Wed, 4 Mar 2026, at 10:38 AM, David Radley wrote:
> > Hi Mika,
> > This sounds like a good idea, in terms of scope, Is the idea that this
> is purely for unit tests or is this additionally proposed as validation /
> test harness for use when developing custom PTFs.
> > I guess this allows us to create a common set of tests that all PTFs
> need to pass using this harness.
> >
> > I would assume there are real (not event)  time considerations for some
> PTFs, it would be worth mentioning how we should handle that.
> >
> >    Kind regards,  David.
> >
> > From: Mika Naylor <[email protected]>
> > Date: Tuesday, 3 March 2026 at 16:46
> > To: [email protected] <[email protected]>
> > Subject: [EXTERNAL] [DISCUSS] FLIP-567: Introduce a ProcessTableFunction
> Test Harness
> >
> > Hey everyone!
> >
> > I would like to kick off a discussion on FLIP-567: Introduce a
> ProcessTableFunction Test Harness[1].
> >
> > Currently, testing PTFs require full integration tests against a running
> Flink cluster. This FLIP would introduce a developer-friendly test harness
> for unit testing PTFs and would provide introspection to output, state,
> timers, and watermarks for assertions and behaviour validation. This would
> let developers iterate and test their PTFs without needing to run a
> fullscale integration test against a live Flink cluster.
> >
> > Would love any thoughts and feedback the community might have on this
> proposal.
> >
> > Kind regards,
> > Mika Naylor
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-567%3A+Introduce+a+ProcessTableFunction+Test+Harness
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: Building C, IBM Hursley Office, Hursley Park Road,
> Winchester, Hampshire SO21 2JN
> >
>

Re: [DISCUSS] FLIP-567: Introduce a ProcessTableFunction Test Harness

Reply via email to