[ https://issues.apache.org/jira/browse/HUDI-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Guo updated HUDI-4959: ---------------------------- Fix Version/s: 0.12.2 (was: 0.13.0) > Serializing objects using Kryo fails to deserialize data back w/o prior > registration > ------------------------------------------------------------------------------------ > > Key: HUDI-4959 > URL: https://issues.apache.org/jira/browse/HUDI-4959 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core > Affects Versions: 0.12.0 > Reporter: Alexey Kudinkin > Assignee: Alexey Kudinkin > Priority: Blocker > Labels: pull-request-available > Fix For: 0.12.2 > > > Originally reported in: > [https://github.com/apache/hudi/issues/6621] > > Kryo (used in SerializationUtils) by default allows class objects to be > serialized w/o prior registration w/ Kryo: in that case Kryo will encode the > first occurrence of the object of a particular class with full class-name, > but subsequent occurrences will be using class-id associated with it (on the > fly). > This poses issues for durable serialization (when we persist such serialized > layout) in this case we're trying to deserialize file that doesn't have the > class-name encoded and since user is running a different Spark job to read > there's no association preserved in-memory either. > *NOTE: We should be using custom serialization sequences for every object we > serialize for durable persistence, and avoid using frameworks like Kryo for > that.* > > ---- > *EDIT* > I'm taking back my hypothesis that the issue is in the class encoding, after > writing a small test to validate the issue i confirmed that Kryo actually > writes out full class-name for all classes registered implicitly (as it > should). > It seems that the problem is actually indeed in misalignment of the Avro > versions as reported by [@KnightChess|https://github.com/KnightChess]: > quick-checking i see that b/w Avro 1.8.2 and 1.10.2, {{Utf8}} actually had > one more field added: > {code:java} > // 1.8.2 > private byte[] bytes = EMPTY; > private int length; > private String string; > // 1.10.2 > private byte[] bytes; > private int hash; > private int length; > private String string; {code} > > {{ }}Provided that we're relying on Kryo to generate serializer for > {{orderingVal}} that could be {{Utf8}} (based on {{{}FieldSerializer{}}}) it > would actually explain why it couldn't deserialize it back (since they will > have different serializers). -- This message was sent by Atlassian Jira (v8.20.10#820010)