Thank you very much for the feedback.

- With this pull-request build, Flink runs successfully with a JDK 17 runtime 
for applications without saved state or with applications with saved state from 
this pull-request build which is using Kryo 5.x. FYI, the Maven build is still 
run with JDK 8 or 11 but the Flink jobmanager and taskmanager can be run with a 
JDK 17 runtime.
- Kryo 2.x is still on the classpath for backwards compatibility purposes, and 
if you try to load a savepoint from Flink 1.17 or older which uses the Kryo 2.x 
serialization library with JDK 17+, that will fail with exceptions.
- A stateful upgrade pathway looks like this: Applications run a Flink cluster 
with this pull-request under JDK 8 or 11, load an existing savepoint with Kryo 
2.x data, write out a new savepoint which automatically uses Kryo 5.x, restart 
the Flink cluster with a JDK 17 runtime, and resume from the new savepoint 
successfully.
- This pull-request build supports Java records (which obviously requires 
JDK17+ at runtime) with the Flink DataStream API. Kryo 5.x supports records so 
this works without any extra configuration. A simple demo is here: 
https://github.com/kurtostfeld/flink-kryo-upgrade-demo/blob/main/flink-record-demo/src/main/java/demo/app/Main.java.
 The app is built with JDK 17, Flink's Maven build still runs with JDK 8/11, 
but the Flink cluster uses JDK 17 at runtime.


I need to investigate the scenario you describe. If I understand correctly, the 
scenario is resuming from multiple checkpoint files or from a savepoint and 
checkpoint files which may be generated by different versions of Flink and 
therefore may be using different Kryo library versions. Is that accurate? We 
need to accommodate that scenario and I will investigate.




------- Original Message -------
On Thursday, June 1st, 2023 at 10:25 AM, Chesnay Schepler <ches...@apache.org> 
wrote:


>
>
> The version in the state is the serializer version, and applies to the
> entire state, independent of what it contains.
> If you use Kryo2 for reading and Kryo5 for writing (which also implies
> writing the new serializer version into state), then I'd assume that a
> migration is an all-or-nothing kind of deal.
> IOW, you'd have to load a savepoint and write out an entirely new
> savepoint with the new state.
> Otherwise you may have only re-written part of the checkpoint, and now
> contains a mix of Kryo2/Kryo5 serialized classes, which should then fail
> hard on any recovery attempt because we wouldn't use Kryo2 to read
> anything.
>
> If I'm right, then as is this sounds like quite a trap for users to fall
> into because from what I gathered this is the default behavior in the PR
> (I could be wrong though since I haven't fully dug through the 8k lines
> PR yet...)
>
> What we kind of want is this:
> 1) Kryo5 is used as the default for new jobs. (maybe not even that,
> making it an explicit opt-in)
> 2) Kryo2 is used for reading AND writing for existing* jobs by default.
> 3) Users can explicitly (and easily!) do a full migration of their jobs,
> after which 2) should no longer apply.
>
>
>
> In the PR you mentioned running into issues on Java 17; to have have
> some error stacktraces and examples data/serializers still around?
>
> On 30/05/2023 00:38, Kurt Ostfeld wrote:
>
> > > I’d assumed that there wasn’t a good way to migrate state stored with an 
> > > older version of Kryo to a newer version - if you’ve solved that, kudos.
> > > I hope I've solved this. The pull request is supposed to do exactly this. 
> > > Please let me know if you can propose a scenario that would break this.
> >
> > The pull-request has both Kryo 2.x and 5.x dependencies. It looks at the 
> > state version number written to the state to determine which version of 
> > Kryo to use for deserialization. Kryo 2.x is not used to write new state.
> >
> > ------- Original Message -------
> > On Monday, May 29th, 2023 at 5:17 PM, Ken Krugler 
> > kkrugler_li...@transpac.com wrote:
> >
> > > Hi Kurt,
> > >
> > > I personally think it’s a very nice improvement, and that the longer-term 
> > > goal of removing built-in Kryo support for state serialization (while a 
> > > good one) warrants a separate FLIP.
> > >
> > > Perhaps an intermediate approach would be to disable the use of Kryo for 
> > > state serialization by default, and force a user to disregard warnings 
> > > and explicitly enable it if they want to go down that path.
> > >
> > > I’d assumed that there wasn’t a good way to migrate state stored with an 
> > > older version of Kryo to a newer version - if you’ve solved that, kudos.
> > >
> > > — Ken
> > >
> > > > On May 29, 2023, at 2:21 PM, Kurt Ostfeld kurtostf...@proton.me.INVALID 
> > > > wrote:
> > > >
> > > > Hi everyone. I would like to start the discussion thread for FLIP-317: 
> > > > Upgrade Kryo from 2.24.0 to 5.5.0 [1].
> > > >
> > > > There is a pull-request associated with this linked in the FLIP.
> > > >
> > > > I'd particularly like to hear about:
> > > >
> > > > - Chesnay Schepler's request to consider removing Kryo serializers from 
> > > > the execution config. Is this a reasonable task to add into this FLIP? 
> > > > Is there consensus on how to resolve that? Would that be better 
> > > > addressed in a separate future FLIP after the Kryo upgrade FLIP is 
> > > > completed?
> > > >
> > > > - Backwards compatibility. The automated CI tests have a lot of 
> > > > backwards compatibility tests that are passing. I also wrote a Flink 
> > > > application with keyed state using custom Kryo v2 serializers and then 
> > > > an upgraded version with both Kryo v2 and Kryo v5 serializers to stress 
> > > > test the upgrade process. I'd like to hear about additional scenarios 
> > > > that need to be tested.
> > > >
> > > > - Is this worth pursuing or is the Flink project looking to go in a 
> > > > different direction? I'd like to do some more work on the pull request 
> > > > if this is being seriously considered for adoption.
> > > >
> > > > I'm looking forward to hearing everyone's feedback and suggestions.
> > > >
> > > > Thank you,
> > > > Kurt
> > > >
> > > > [1] 
> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-317%3A+Upgrade+Kryo+from+2.24.0+to+5.5.0
> > >
> > > --------------------------
> > > Ken Krugler
> > > http://www.scaleunlimited.com
> > > Custom big data solutions
> > > Flink, Pinot, Solr, Elasticsearch

Reply via email to