[
https://issues.apache.org/jira/browse/FLINK-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731784#comment-15731784
]
ASF GitHub Bot commented on FLINK-5051:
---------------------------------------
GitHub user StefanRRichter opened a pull request:
https://github.com/apache/flink/pull/2962
[FLINK-5051] Backwards compatibility for serializers in backens
This PR sits on top of PR #2781 and introduces future backwards
compatibility for state serializers and backends. We do so by providing version
compatibility checking for TypeSerializer and making the serializers mandatory
part of a keyed backend's meta data in checkpoints (so that we have everything
required to reconstruct states in a self contained way). A serialization proxy
is introduced for keyed state backend and operator state backend. Currently
this serialization proxy covers the meta data, not yet the actual data. For
most parts, the PR essentially moves functionality to a different place or
makes formats more explicit.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StefanRRichter/flink
serializer-backwards-compatibility-operator
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2962.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2962
----
commit a373585c2fe71b467f49f0e295dc647b43ab7a9c
Author: Stefan Richter <[email protected]>
Date: 2016-11-01T11:29:01Z
Backwards compatibility 1.1 -> 1.2
commit 8e4e4bcede50e66a95928ec854e51d45a7df28bf
Author: Stefan Richter <[email protected]>
Date: 2016-11-09T13:54:35Z
Removing some unecessary code from migration classes
commit 78bd66fade7f836eafbab978329caf1ea26f2ffc
Author: Stefan Richter <[email protected]>
Date: 2016-11-09T17:21:13Z
MultiStreamStateHandle
commit a9355679c3476dd890b54312e1696b61c7839873
Author: Stefan Richter <[email protected]>
Date: 2016-11-10T13:18:55Z
Added migration unit test
commit d079bd4bdb762c307a3c5cd084590804b90996b1
Author: Stefan Richter <[email protected]>
Date: 2016-11-10T13:45:58Z
rebase fixes
commit 9f47bac9c25fc33993c3942a57462039cc578dcd
Author: Stefan Richter <[email protected]>
Date: 2016-11-11T13:46:39Z
Minor cleanups: deleting more unnecessary classes
commit 2bbe66386d28c7914c62e2c3829ff3ab6840164c
Author: Stefan Richter <[email protected]>
Date: 2016-11-23T13:15:33Z
Versioned serialization
commit 6460e27717ab208aada988ba2c83d5628b31b310
Author: Stefan Richter <[email protected]>
Date: 2016-11-23T17:59:45Z
Common meta info introduced to keyed backends
commit e7d66377730339523bad8e3e6e75865ea5a29a6b
Author: Stefan Richter <[email protected]>
Date: 2016-11-23T21:40:26Z
Introducing isCompatibleWith to TypeSerializers
commit 89e3779d231fd0dadb01782791c92ec8ebb15a81
Author: Stefan Richter <[email protected]>
Date: 2016-11-23T22:33:42Z
Splitting / Introducing interface for versiond and compatibile
commit 434f9424e5cd0e01d45e51f44b917306606c5fb1
Author: Stefan Richter <[email protected]>
Date: 2016-11-24T10:59:01Z
Cleanup and documentation
commit 5cb40348dac235dfbe6c6fda532f2b87a6aee7f9
Author: Stefan Richter <[email protected]>
Date: 2016-11-24T16:19:51Z
Better abstractions
commit 500361fb07a428034cb96deb46ece3531d277080
Author: Stefan Richter <[email protected]>
Date: 2016-11-24T16:29:24Z
Serialization proxies for operator state backend meta data
commit 614ab7531a644eaf9edbc420383936bb6e39a34b
Author: Stefan Richter <[email protected]>
Date: 2016-12-01T14:12:40Z
handle one forgotten type of state handle
commit 11792a84c6dbf5057a46fc98b5190abf2cfad014
Author: Stefan Richter <[email protected]>
Date: 2016-12-01T14:37:29Z
Serialization Proxies for KeyedBackends, OperatorBackends, TypeSerializers.
Still needs integration in OperatorBackend.
commit 89dcc375bd732a370609c10bcaaf5c3b42e93b98
Author: Stefan Richter <[email protected]>
Date: 2016-12-02T00:19:44Z
isCompatibleWith, code dedup and cleanup
commit 3bf993be2aaf17d7f71f0b3709b15aeb78baed6b
Author: Stefan Richter <[email protected]>
Date: 2016-12-02T00:20:05Z
Tests
commit 336fbedf8699792da3586ebe5d30d2644f0abe08
Author: Stefan Richter <[email protected]>
Date: 2016-12-02T00:43:22Z
Some compatibility logic for compount tuple serializer
commit 4ad6fc7884bede64f15d6251e752c97222ac665a
Author: Stefan Richter <[email protected]>
Date: 2016-12-02T11:29:43Z
Partial rollback, going for the simpelest approach. Also including some
info about state type to serialization
commit d7eed9bc8b13803aa01fc682b22c100ba0e76072
Author: Stefan Richter <[email protected]>
Date: 2016-12-02T12:15:59Z
fixup for base branch, todo cherrypick
commit e962751de45e00d25e063d2f6f19f82dff8f4838
Author: Stefan Richter <[email protected]>
Date: 2016-12-02T15:58:24Z
Fix for 1/0 if UDF present
commit d827f3dd27054844dd154e620e7a1cd75d43fdf5
Author: Stefan Richter <[email protected]>
Date: 2016-12-02T16:00:28Z
Fix for Unknown State type in statemetainfo
commit 5fb822e478acb6bd046b7376834e9e16e80dffd1
Author: Stefan Richter <[email protected]>
Date: 2016-12-05T13:23:49Z
Introduce Eager restore and serialization proxies in
DefaultOperatorStateBackend
commit dbb54e765bc163c9b33f014b60ecfed8bc98f65c
Author: Stefan Richter <[email protected]>
Date: 2016-12-06T15:00:59Z
WIP offset stream
commit 2331f62ff3a21d9a1337d336573c3eb4b8305e7e
Author: Stefan Richter <[email protected]>
Date: 2016-12-07T14:37:23Z
Backwards compatibility for JobVertexID generation.
commit 820f1fde8899d6068a6ad08cf1f93312d3f238c9
Author: Stefan Richter <[email protected]>
Date: 2016-12-07T15:25:59Z
Backwards compatibility for JobVertexID generation ->
StateAssignmentOperation.
commit 3e2c877bb3a4049a817a23f7144d3234e65af0f4
Author: Stefan Richter <[email protected]>
Date: 2016-12-07T15:34:32Z
Backwards compatibility for JobVertexID generation -> Fixups.
commit 6182fb4d40ab5bfb64730276e53435d8206c1373
Author: Stefan Richter <[email protected]>
Date: 2016-12-07T15:55:39Z
unit test for legacy jobvertexid
commit a8753e57054c0e043c01eb4b29558ad3467e0de5
Author: Stefan Richter <[email protected]>
Date: 2016-12-07T20:23:35Z
[FLINK-5283] Fix closing streams when restoring old savepoint in keyed
backends
commit 5f4bd4c352a27a3dadd606508ade3ba33c729ad3
Author: Stefan Richter <[email protected]>
Date: 2016-12-07T20:25:29Z
[FLINK-5282] Fix closing streams on exception in SavepointV0Serializer
----
> Backwards compatibility for serializers in backend state
> --------------------------------------------------------
>
> Key: FLINK-5051
> URL: https://issues.apache.org/jira/browse/FLINK-5051
> Project: Flink
> Issue Type: Improvement
> Components: State Backends, Checkpointing
> Reporter: Stefan Richter
> Assignee: Stefan Richter
>
> When a new state is register, e.g. in a keyed backend via
> `getPartitionedState`, the caller has to provide all type serializers
> required for the persistence of state components. Explicitly passing the
> serializers on state creation already allows for potentiall version upgrades
> of serializers.
> However, those serializers are currently not part of any snapshot and are
> only provided at runtime, when the state is registered newly or restored. For
> backwards compatibility, this has strong implications: checkpoints are not
> self contained in that state is currently a blackbox without knowledge about
> it's corresponding serializers. Most cases where we would need to restructure
> the state are basically lost. We could only convert them lazily at runtime
> and only once the user is registering the concrete state, which might happen
> at unpredictable points.
> I suggest to adapt our solution as follows:
> - As now, all states are registered with their set of serializers.
> - Unlike now, all serializers are written to the snapshot. This makes
> savepoints self-contained and also allows to create inspection tools for
> savepoints at some point in the future.
> - Introduce an interface {{Versioned}} with {{long getVersion()}} and
> {{boolean isCompatible(Versioned v)}} which is then implemented by
> serializers. Compatible serializers must ensure that they can deserialize
> older versions, and can then serialize them in their new format. This is how
> we upgrade.
> We need to find the right tradeoff in how many places we need to store the
> serializers. I suggest to write them once per parallel operator instance for
> each state, i.e. we have a map with state_name -> tuple3<serializer<KEY>,
> serializer<NAMESPACE>, serializer<STATE>>. This could go before all
> key-groups are written, right at the head of the file. Then, for each file we
> see on restore, we can first read the serializer map from the head of the
> stream, then go through the key groups by offset.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)