Re: [DISCUSS] FLIP-599: State Catalog

Dennis-Mircea Ciupitu Fri, 03 Jul 2026 01:13:16 -0700

Hi Gyula,

Thanks for the detailed answers. This addresses my questions well and the
direction sounds great.


+1 (non-binding) from my side.

Best regards,
Dennis


On Thu, Jul 2, 2026 at 3:26 PM Gyula Fóra <[email protected]> wrote:

> Hi Dennis!
>
> Thank you for the questions. Much recent work in the state connector api
> has been done basically towards this type of nice cataloging and flexible
> access. There are a few holes and things that have to be changed, not
> everything is enumerated in the FLIP but we have to have an open mind and
> make all necessary changes as you said to make this truly nice and
> comprehensive as much as possible. Most state processor apis are marked
> experimental so we can be flexible within reason :)
>
> Now to the concrete questions:
>
> 1. Non-keyed state support / scope
> I think non-keyed states should definitely be in the scope of the FLIP in
> terms of design , and my intention was not to exclude them I just focused
> on the keyed state as that is readily available in our prototype
> implementation (without much changes to the existing connectors). I will
> try to update the FLIP to include non-keyed states more in detail but I
> think the case is pretty straightforward. From a table representation
> perspective, they can follow a similar pattern such as:
> uid_opUID_statename_broadcast  , uid_opUID_statename_list . A corresponding
> SQL connector can easily be added to support these based on the existing
> datastream connector. I will make sure to add separate tickets for these
> types of states once the FLIP is accepted and this work can very easily be
> parallelized across different state types within the existing catalog
> frameworks. This way keyed/non-keyed states will live directly together in
> a single catalog/db.
>
> In the future we can even go a step further and include connector specific
> state views such as kafka offsets etc with custom connector specific
> plugins
>
> 2/3. Serializer transparency and robustness
> From a practical standpoint both generated (synthetic) serializers and
> custom classes / kryo and pluggable logic could work but the whole catalog
> concepts requires a certain behaviour to be useful. The catalog would point
> to savepoint directories and discover all state in it (potentially from
> multiple jobs). Configuration has to be done in a generic way, I don't see
> a problem with introducing configs for specifying custom
> serializers/factories either generically for certain specific classes. In
> most cases however this won't be necessary as the state snapshot itself
> usually has a reference (classname) of the original user classes. If the
> catalog process has access to those classes it will use that directly, or
> other confugred serializers, and only if not available fall back to
> generating serializers for POJO/TUPLE types. There is obviously a limit to
> what is possible here initially, Kryo being one exception where you either
> have the class or not.
>
> I would like to however point out that we do not have to support everything
> initially, we can start with what is currently available, use the classpath
> / generated serializers and as we develop we will find the limits of this
> approach and then can extend with configuration as it feels natural instead
> of trying to create a super complex initial solution. But I definitely
> agree that we should support custom serializer already specified in the
> config that is otherwise used by flink for the jobs (but I think this
> should more or less work out of the box).
>
> 4. The metadata view is currently reused based on the existing table valued
> function. Let's take this as a followup under this umbrella to improve /
> extend the metadata view. I don't think we need a separate FLIP but it also
> feels out of scope here.
>
> Cheers
> Gyula
>
>
>
>
> On Thu, Jul 2, 2026 at 1:02 PM Dennis-Mircea Ciupitu <
> [email protected]> wrote:
>
> > Hi all,
> >
> > Thank you for driving this. Being able to discover savepoints/checkpoints
> > and query their state as SQL tables without shipping the original user
> > classes is a genuinely valuable addition, and it's nice that it builds on
> > the existing state-table connector and savepoint_metadata work rather
> than
> > starting from scratch.
> >
> > A few points and questions, mostly around scope and the serializer story:
> >
> >    1. Non-keyed state and the DataStream path.
> >       - The FLIP scopes out BroadcastState, operator ListState and
> >       UnionState because "no readily available Table API connectors exist
> > for
> >       these state types." That's a fair characterization of the Table
> > layer, but
> >       the state-processor DataStream API already reads all three today
> >       (SavepointReader#readBroadcastState / #readUnionState /
> > #readListState). So
> >       the limitation is really in the keyed-only SQL mapping
> > (KeyedStateReader
> >       runs inside a keyed backend), not in the snapshots themselves.
> >       - Is the keyed-only scope a deliberate UX/table-mapping decision,
> or
> >       would a DataStream-backed reader be considered so the catalog isn't
> >       strictly less capable than the API it extends? Even if non-keyed
> > contents
> >       stay out of scope initially, it would be good to frame this
> > explicitly as a
> >       Table-mapping constraint rather than a general one.
> >    2. Serializer transparency - the "no user classes" premise vs. custom
> >    serializers.
> >       - The design relies on Flink's transparent serializer formats to
> >       decode state without user dependencies, which is great for
> > POJO/Avro/basic
> >       types. But two serialization efforts point the other way: FLIP-398
> > [1]
> >       (released) already lets users configure serializers per type via
> >       pipeline.serialization-config, and FLIP-538 [2] (in discussion)
> adds
> >       pluggable custom generic-type serializers (e.g. Apache Fory) and
> > promotes
> >       TypeSerializer/TypeSerializerSnapshot to @Public. As FLIP-538
> > itself notes,
> >       state written with a custom serializer becomes dependent on that
> > serializer
> >       to decode - external tooling without it cannot read those bytes.
> >       - Could we make the deserialization side pluggable and
> config-driven,
> >       mirroring FLIP-398's serialization-config, with a graceful fallback
> > (e.g.
> >       expose the raw bytes / skip the column) when a format isn't
> > transparently
> >       decodable? There already seems to be a seam for this
> >       (SavepointTypeInformationFactory), and making it a first-class,
> >       config-selectable option would keep the catalog forward-compatible
> as
> >       serialization support grows.
> >    3. Robustness of the transparent decoding path.
> >       - Related to (2): reconstructing values by mirroring the binary
> >       layout (PojoToRowDataDeserializer) is the most powerful but also
> the
> > most
> >       fragile part of the design. How is it expected to behave across
> > serializer
> >       schema evolution / state migration (a serializer snapshot that
> > differs from
> >       the writer's), Kryo-fallback fields, nested/generic types, and
> > nullability?
> >       - It would help to spell out the supported matrix and the failure
> >       mode (hard error vs. degrade to raw bytes) up front, since this
> > is exactly
> >       where "read without the user classes" is most likely to break in
> > practice.
> >    4. Observability / summary reporting.
> >       - The metadata view is a great start. Two small asks:
> >          - per-subtask (or per-key-group) size granularity in addition to
> >          per-operator, since skew is usually what you are chasing on
> > large state;
> >          - optionally rounding out the size breakdown with managed/raw
> >          operator state and channel state sizes for a full picture
> (noting
> > the
> >          latter are in-flight / unaligned-checkpoint buffers rather
> > than user state).
> >       - A prominent upfront summary of the largest operators / state is
> >       often what users want before drilling in.
> >
> >
> > Best regards,
> > Dennis
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/spaces/FLINK/pages/282102217/FLIP-398+Improve+Serialization+Configuration+And+Usage+In+Flink
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/spaces/FLINK/pages/373886828/FLIP-538+Support+Custom+Generic+Type+Serializer
> >
> > On Mon, Jun 29, 2026 at 12:53 PM Gyula Fóra <[email protected]> wrote:
> >
> > > Hi Flink Devs!
> > >
> > > I would like to start the discussion about FLIP-599: State Catalog [1]
> > >
> > > State and stateful processing has always been one of the most
> fundamental
> > > features of Flink and a major contributor to its success and global
> > > adoption.
> > >
> > > Over the years several apis and methods have been developed to address
> > the
> > > need for external access and analytics such as the state processor
> > > datastream / java apis, the since deprecated queryable state
> abstractions
> > > and more recently a number of table / SQL api connectors to access
> state
> > > metadata and keyed states in a somewhat limited way.
> > >
> > > Extending the current capabilities of the state-process-api, this FLIP
> > aims
> > > to lift state processing,  analytics and observability to a new level
> by
> > > introducing the State Catalog.
> > >
> > > State Catalog is a Flink SQL Catalog implementation that allows
> > discovering
> > > savepoints/checkpoints and mapping their state automatically to SQL
> > tables.
> > > The tables are derived for the different operators and their keyed
> states
> > > with schema matching the state structure. Most importantly it supports
> > > reading POJO / Avro and other structured and basic type states without
> > the
> > > original user classes (dependencies) by relying on Flink's transparent
> > and
> > > efficiently structured serializer formats.
> > >
> > > We have a fully functional prototype implementation developed with
> Gabor
> > > Somogyi that we will be happy to share if the community accepts the
> > > proposal!
> > >
> > > Looking forward to your feedback and suggestions!
> > >
> > > Gyula
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/spaces/FLINK/pages/438009922/FLIP-599+State+Catalog
> > >
> >
>

Re: [DISCUSS] FLIP-599: State Catalog

Reply via email to