[ 
https://issues.apache.org/jira/browse/SPARK-34023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259996#comment-17259996
 ] 

Nick Nezis commented on SPARK-34023:
------------------------------------

Good question. And perhaps the answer is that it's not worth doing the upgrade. 
I'll try to do my best to gather answers for these questions. 

Some of the major changes which were documented [in Google 
Groups.|https://groups.google.com/g/kryo-users/c/sBZ10dwrwFQ] This write up is 
from 2018, so not sure if any of the information is outdated:
 * Generics has been redone to support all scenarios. 
[GenericsUtil|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/util/GenericsUtil.java]
 thoroughly discovers the generic types that are known at compile time 
([tests|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/test/com/esotericsoftware/kryo/util/GenericsUtilTest.java]),
 which avoids as much work as possible during serialization. The generic types 
known only at serialization time are tracked by Kryo's instance of 
[Generics|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/util/Generics.java],
 which maintains a stack of 
[GenericType|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/util/Generics.java#L221-L222]
 instances (lots of nice javadocs in those classes).

How does using the new generics API look? For a simple example with just one 
type parameter, see 
[CollectionSerializer|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/serializers/CollectionSerializer.java#L80-L97].
 To get the class, call kryo.getGenerics().nextGenericClass() then after 
reading the child objects call kryo.getGenerics().popGenericType(). Simple!

nextGenericClass() is a shortcut for the common case of a class with a single 
type parameter. When there are multiple type parameters, like 
[Map|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/serializers/MapSerializer.java#L79-L89],
 use nextGenericTypes() instead, then call resolve() to get each class. Also, 
the last parameter is made current automatically (meaning it is pushed by 
Generics#nextGenericTypes()), so the other parameter(s) have to be [pushed and 
popped 
manually|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/serializers/MapSerializer.java#L93-L101].
 Somewhat less simple!

It's a little tricky, but these two patterns are all serializers need to worry 
about. It provides full support for nested generic types, eg 
HashMap<ArrayList<Integer>, ArrayList<String>>, which wasn't possible before.

Feedback on all this is welcome, but the core of it is relatively complex and 
may induce a headache (it certainly did when writing it!). The important bits 
are 1) to handle all generics scenarios, and 2) to minimize work at 
serialization time. Digging through just the calls for generics made from 
serializers, you'll find minimal work is done and without allocations. Given 
this, generics are always enabled.


 * Serializer method signatures have changed for [issue 
#146|https://github.com/EsotericSoftware/kryo/pull/146] in [this 
commit|https://github.com/EsotericSoftware/kryo/commit/d22a23dccd9b38461950084188f69597f29c4a9a].
 All Serializer classes need to be changed from Class<T> to Class<? extends T>.


 * Various serialization improvements. Eg, TaggedFieldSerializer now has 
acceptsNull=true, which saves 1 byte for all non-null objects.


 * Unsafe had permeated the API. IMO it should be sandboxed as much as 
possible. I began refactoring by removing Unsafe support completely, made 
everything nice, then put back the input/output streams. I haven't yet looked 
into what serializers would need to make use of all of the Unsafe features. 
With Java 9 dropping Unsafe, maybe it is not worth the considerable effort to 
support it. FWIW, I personally don't use Unsafe but I know others do. I assume 
some even choosing Kryo specifically for Unsafe. I don't know how those people 
feel about Java 9.

I have not looked at Java 9 at all yet. I don't know if it provides APIs that 
Kryo can use to be more efficient.


 * FieldSerializer had gotten messy internally. It did a lot more work than 
necessary to build the cached fields. It did things with generics that were 
suspect and did expensive computations that were not even used. It had a lot of 
Unsafe logic.


 * Various API improvements. The Kryo class must not become a junk drawer for 
serializer settings. The SerializerFactory classes can be used, eg 
[FieldSerializerFactory|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/SerializerFactory.java#L96-L98],
 which can create new, configured serializers 
([example|http://n4te.com/x/367-0yar.txt]). Having Input/Output varint and 
varlong be a hint was odd. If this ends up being needed again, maybe depending 
on what happens with Unsafe, it could be done in Input/Output subclasses 
without affecting the base class API. Javadocs are much improved.


 * Logging is improved. Some useless junk was logged, some important junk was 
not logged, and the formatting of log messages was not consistent.


 * Deserialization still [temporarily 
modifies|https://github.com/EsotericSoftware/kryo/blob/kryo-5.0.0-dev/src/com/esotericsoftware/kryo/io/Input.java#L604-L606]
 the input buffer. While in many cases this is fine, it can be quite an 
unexpected gotcha in some cases. I'd like to remedy this but string writing is 
important and any changes here need extensive benchmarks.

> Updated to Kryo 5.0.3
> ---------------------
>
>                 Key: SPARK-34023
>                 URL: https://issues.apache.org/jira/browse/SPARK-34023
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build, Spark Core
>    Affects Versions: 3.2.0
>            Reporter: Nick Nezis
>            Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to