GitHub user tzulitai opened a pull request:

    https://github.com/apache/flink/pull/3834

    [FLINK-6425] [runtime] Activate serializer upgrades in state backends

    This is a follow-up PR that finalizes serializer upgrades, and is based on 
#3804 (therefore, only the 2nd and 3rd commits, ed82173 and e77096a is 
relevant).
    
    This PR includes the following changes:
    1. Write configuration snapshots of serializers along with checkpoints 
(this changes serialization format of checkpoints).
    2. On restore, confront configuration snapshots with newly registered 
serializers using the new 
`TypeSerializer#getMigrationStrategy(TypeSerializerConfigSnapshot)` method.
    3. Serializer upgrades is completed if the confrontation determines that no 
migration is needed. The confrontation reconfigures the new serializer if the 
case requires. If the serializer cannot be reconfigured to avoid state 
migration, the job simply fails (as we currently do not have the actual state 
migration feature).
    
    Note that the confrontation of config snapshots is currently only performed 
in the `RocksDBKeyedStateBackend`, which is the only place where this is 
currently needed due to its lazy deserialization characteristic. After we have 
eager state migration in place, the confrontation should happen for all state 
backends on restore.
    
    ## Tests
    - Serialization compatibility of the new checkpoint format is covered with 
existing tests.
    - Added a test that makes sure `InvalidClassException` is also caught when 
deserializing old serializers in the checkpoint (which occurs if the old 
serializer implementation was changed and results in a new serialVersionUID).
    - Added tests for Java serialization failure resilience when reading the 
new checkpoints, in `SerializerProxiesTest`.
    - Added end-to-end snapshot + restore tests which require reconfiguration 
of the `KryoSerializer` and `PojoSerializer` in cases where registration order 
of Kryo classes / Pojo types were changed.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tzulitai/flink FLINK-6425

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3834.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3834
    
----
commit 538a7acecce0d72e36e3726c0df2b6b96be35fc3
Author: Tzu-Li (Gordon) Tai <tzuli...@apache.org>
Date:   2017-05-01T13:32:10Z

    [FLINK-6190] [core] Migratable TypeSerializers
    
    This commit introduces the user-facing APIs for migratable
    TypeSerializers. The new user-facing APIs are:
    
    - new class: TypeSerializerConfigSnapshot
    - new class: ForwardCompatibleSerializationFormatConfig
    - new method: TypeSerializer#snapshotConfiguration()
    - new method: TypeSerializer#reconfigure(TypeSerializerConfigSnapshot)
    - new enum: ReconfigureResult

commit ed82173fe97c6e9fb0784696bc4c49f10cc4e556
Author: Tzu-Li (Gordon) Tai <tzuli...@apache.org>
Date:   2017-05-02T11:35:18Z

    [hotfix] [core] Catch InvalidClassException in 
TypeSerializerSerializationProxy
    
    Previously, the TypeSerializerSerializationProxy only uses the dummy
    ClassNotFoundDummyTypeSerializer as a placeholder in the case where the
    user uses a completely new serializer and deletes the old one.
    
    There is also the case where the user changes the original serializer's
    implementation and results in an InvalidClassException when trying to
    deserialize the serializer. We should also use the
    ClassNotFoundDummyTypeSerializer as a temporary placeholder in this
    case.

commit e77096af29b4cbea26113928fe93218c075e4035
Author: Tzu-Li (Gordon) Tai <tzuli...@apache.org>
Date:   2017-05-06T12:40:58Z

    [FLINK-6425] [runtime] Activate serializer upgrades in state backends
    
    This commit fully activates state serializer upgrades by changing the
    following:
    - Include serializer configuration snapshots in checkpoints
    - On restore, use configuration snapshots to confront new serializers to
      perform the upgrade

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to