Josh Rosen created SPARK-27799:
----------------------------------

             Summary: Allow SerializerManager.canUseKryo to be customized via 
configuration
                 Key: SPARK-27799
                 URL: https://issues.apache.org/jira/browse/SPARK-27799
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.0
            Reporter: Josh Rosen


Kryo serialization can offer a substantial performance boost compared to Java 
serialization and I generally recommend that users configure Spark to use it.

In general it may not be safe to blindly flip the default to Kryo: certain jobs 
might depend on Java serialization, so switching them to Kryo might cause 
crashes or incorrect behavior.

However, we may know that certain data types are safe to serialize with Kryo, 
in which case we can whitelist _just those types_ for use with Kryo 
serialization but keep everything else using the default Java serializer.

Back in SPARK-13926 (Spark 2.0) I added a {{SerializerManager}} to implement 
this idea for strings, primitives, primitive arrays, and a few other data 
types: those types will automatically use Kryo serialization when at top-level 
types in RDDs. However, there's no ability for users to customize / extend this 
whitelist.

I propose to add a new user-facing configuration, name TBD, which accepts a 
comma-separated list of class / interface names and uses the to expand the 
{{SerializerMananger.canUseKryo}} whitelist.

This will allow advanced users to incrementally default to Kryo for certain 
types (e.g. Scrooge ThriftStructs).

This feature is useful for "data platform" teams who provide Spark-as-a-service 
to internal customers: with this proposed configuration, platform teams can 
configure global defaults for serialization in a way which is more incremental 
/ narrow-in-scope than simply defaulting to Kryo everywhere.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to