[ 
https://issues.apache.org/jira/browse/SPARK-23978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845336#comment-16845336
 ] 

Josh Rosen commented on SPARK-23978:
------------------------------------

+1; I've also seen this in unit tests of my own Spark libraries.

We may be able to fix this by performing the class existence check once per JVM 
(i.e. cache its outcome) instead of per-Kryo-initialization.

> Kryo much slower when mllib jar not on classpath
> ------------------------------------------------
>
>                 Key: SPARK-23978
>                 URL: https://issues.apache.org/jira/browse/SPARK-23978
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.0
>         Environment: Windows 10, Java 8
>            Reporter: Richard Wilkinson
>            Priority: Minor
>         Attachments: kryo_stats.png
>
>
> Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib 
> classes to the kryo registration, but it does this via class.forName.
> If the mllib jar is not on the classpath, this can be very slow.
> My app, which is using GraphX connected components function is 2x slower in 
> 2.3 than 2.2.1
> I have attached jVisualVM stats for both cases; you can see a vast amount of 
> time is spent in Utils.classForName.  While debugging, i traced this to the 
> Kryo initialization



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to