Hi Gyula, Thanks for the detailed answers. This addresses my questions well and the direction sounds great.
+1 (non-binding) from my side. Best regards, Dennis On Thu, Jul 2, 2026 at 3:26 PM Gyula Fóra <[email protected]> wrote: > Hi Dennis! > > Thank you for the questions. Much recent work in the state connector api > has been done basically towards this type of nice cataloging and flexible > access. There are a few holes and things that have to be changed, not > everything is enumerated in the FLIP but we have to have an open mind and > make all necessary changes as you said to make this truly nice and > comprehensive as much as possible. Most state processor apis are marked > experimental so we can be flexible within reason :) > > Now to the concrete questions: > > 1. Non-keyed state support / scope > I think non-keyed states should definitely be in the scope of the FLIP in > terms of design , and my intention was not to exclude them I just focused > on the keyed state as that is readily available in our prototype > implementation (without much changes to the existing connectors). I will > try to update the FLIP to include non-keyed states more in detail but I > think the case is pretty straightforward. From a table representation > perspective, they can follow a similar pattern such as: > uid_opUID_statename_broadcast , uid_opUID_statename_list . A corresponding > SQL connector can easily be added to support these based on the existing > datastream connector. I will make sure to add separate tickets for these > types of states once the FLIP is accepted and this work can very easily be > parallelized across different state types within the existing catalog > frameworks. This way keyed/non-keyed states will live directly together in > a single catalog/db. > > In the future we can even go a step further and include connector specific > state views such as kafka offsets etc with custom connector specific > plugins > > 2/3. Serializer transparency and robustness > From a practical standpoint both generated (synthetic) serializers and > custom classes / kryo and pluggable logic could work but the whole catalog > concepts requires a certain behaviour to be useful. The catalog would point > to savepoint directories and discover all state in it (potentially from > multiple jobs). Configuration has to be done in a generic way, I don't see > a problem with introducing configs for specifying custom > serializers/factories either generically for certain specific classes. In > most cases however this won't be necessary as the state snapshot itself > usually has a reference (classname) of the original user classes. If the > catalog process has access to those classes it will use that directly, or > other confugred serializers, and only if not available fall back to > generating serializers for POJO/TUPLE types. There is obviously a limit to > what is possible here initially, Kryo being one exception where you either > have the class or not. > > I would like to however point out that we do not have to support everything > initially, we can start with what is currently available, use the classpath > / generated serializers and as we develop we will find the limits of this > approach and then can extend with configuration as it feels natural instead > of trying to create a super complex initial solution. But I definitely > agree that we should support custom serializer already specified in the > config that is otherwise used by flink for the jobs (but I think this > should more or less work out of the box). > > 4. The metadata view is currently reused based on the existing table valued > function. Let's take this as a followup under this umbrella to improve / > extend the metadata view. I don't think we need a separate FLIP but it also > feels out of scope here. > > Cheers > Gyula > > > > > On Thu, Jul 2, 2026 at 1:02 PM Dennis-Mircea Ciupitu < > [email protected]> wrote: > > > Hi all, > > > > Thank you for driving this. Being able to discover savepoints/checkpoints > > and query their state as SQL tables without shipping the original user > > classes is a genuinely valuable addition, and it's nice that it builds on > > the existing state-table connector and savepoint_metadata work rather > than > > starting from scratch. > > > > A few points and questions, mostly around scope and the serializer story: > > > > 1. Non-keyed state and the DataStream path. > > - The FLIP scopes out BroadcastState, operator ListState and > > UnionState because "no readily available Table API connectors exist > > for > > these state types." That's a fair characterization of the Table > > layer, but > > the state-processor DataStream API already reads all three today > > (SavepointReader#readBroadcastState / #readUnionState / > > #readListState). So > > the limitation is really in the keyed-only SQL mapping > > (KeyedStateReader > > runs inside a keyed backend), not in the snapshots themselves. > > - Is the keyed-only scope a deliberate UX/table-mapping decision, > or > > would a DataStream-backed reader be considered so the catalog isn't > > strictly less capable than the API it extends? Even if non-keyed > > contents > > stay out of scope initially, it would be good to frame this > > explicitly as a > > Table-mapping constraint rather than a general one. > > 2. Serializer transparency - the "no user classes" premise vs. custom > > serializers. > > - The design relies on Flink's transparent serializer formats to > > decode state without user dependencies, which is great for > > POJO/Avro/basic > > types. But two serialization efforts point the other way: FLIP-398 > > [1] > > (released) already lets users configure serializers per type via > > pipeline.serialization-config, and FLIP-538 [2] (in discussion) > adds > > pluggable custom generic-type serializers (e.g. Apache Fory) and > > promotes > > TypeSerializer/TypeSerializerSnapshot to @Public. As FLIP-538 > > itself notes, > > state written with a custom serializer becomes dependent on that > > serializer > > to decode - external tooling without it cannot read those bytes. > > - Could we make the deserialization side pluggable and > config-driven, > > mirroring FLIP-398's serialization-config, with a graceful fallback > > (e.g. > > expose the raw bytes / skip the column) when a format isn't > > transparently > > decodable? There already seems to be a seam for this > > (SavepointTypeInformationFactory), and making it a first-class, > > config-selectable option would keep the catalog forward-compatible > as > > serialization support grows. > > 3. Robustness of the transparent decoding path. > > - Related to (2): reconstructing values by mirroring the binary > > layout (PojoToRowDataDeserializer) is the most powerful but also > the > > most > > fragile part of the design. How is it expected to behave across > > serializer > > schema evolution / state migration (a serializer snapshot that > > differs from > > the writer's), Kryo-fallback fields, nested/generic types, and > > nullability? > > - It would help to spell out the supported matrix and the failure > > mode (hard error vs. degrade to raw bytes) up front, since this > > is exactly > > where "read without the user classes" is most likely to break in > > practice. > > 4. Observability / summary reporting. > > - The metadata view is a great start. Two small asks: > > - per-subtask (or per-key-group) size granularity in addition to > > per-operator, since skew is usually what you are chasing on > > large state; > > - optionally rounding out the size breakdown with managed/raw > > operator state and channel state sizes for a full picture > (noting > > the > > latter are in-flight / unaligned-checkpoint buffers rather > > than user state). > > - A prominent upfront summary of the largest operators / state is > > often what users want before drilling in. > > > > > > Best regards, > > Dennis > > > > [1] > > > > > https://cwiki.apache.org/confluence/spaces/FLINK/pages/282102217/FLIP-398+Improve+Serialization+Configuration+And+Usage+In+Flink > > [2] > > > > > https://cwiki.apache.org/confluence/spaces/FLINK/pages/373886828/FLIP-538+Support+Custom+Generic+Type+Serializer > > > > On Mon, Jun 29, 2026 at 12:53 PM Gyula Fóra <[email protected]> wrote: > > > > > Hi Flink Devs! > > > > > > I would like to start the discussion about FLIP-599: State Catalog [1] > > > > > > State and stateful processing has always been one of the most > fundamental > > > features of Flink and a major contributor to its success and global > > > adoption. > > > > > > Over the years several apis and methods have been developed to address > > the > > > need for external access and analytics such as the state processor > > > datastream / java apis, the since deprecated queryable state > abstractions > > > and more recently a number of table / SQL api connectors to access > state > > > metadata and keyed states in a somewhat limited way. > > > > > > Extending the current capabilities of the state-process-api, this FLIP > > aims > > > to lift state processing, analytics and observability to a new level > by > > > introducing the State Catalog. > > > > > > State Catalog is a Flink SQL Catalog implementation that allows > > discovering > > > savepoints/checkpoints and mapping their state automatically to SQL > > tables. > > > The tables are derived for the different operators and their keyed > states > > > with schema matching the state structure. Most importantly it supports > > > reading POJO / Avro and other structured and basic type states without > > the > > > original user classes (dependencies) by relying on Flink's transparent > > and > > > efficiently structured serializer formats. > > > > > > We have a fully functional prototype implementation developed with > Gabor > > > Somogyi that we will be happy to share if the community accepts the > > > proposal! > > > > > > Looking forward to your feedback and suggestions! > > > > > > Gyula > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/spaces/FLINK/pages/438009922/FLIP-599+State+Catalog > > > > > >
