[ https://issues.apache.org/jira/browse/SPARK-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631820#comment-14631820 ]
Matt Cheah commented on SPARK-5269: ----------------------------------- Sweet - working with someone else on this actually, but assigning to me is good. I expect that using the Kryo resource pool will provide a fairly elegant solution. > BlockManager.dataDeserialize always creates a new serializer instance > --------------------------------------------------------------------- > > Key: SPARK-5269 > URL: https://issues.apache.org/jira/browse/SPARK-5269 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Ivan Vergiliev > Assignee: Matt Cheah > Labels: performance, serializers > > BlockManager.dataDeserialize always creates a new instance of the serializer, > which is pretty slow in some cases. I'm using Kryo serialization and have a > custom registrator, and its register method is showing up as taking about 15% > of the execution time in my profiles. This started happening after I > increased the number of keys in a job with a shuffle phase by a factor of 40. > One solution I can think of is to create a ThreadLocal SerializerInstance for > the defaultSerializer, and only create a new one if a custom serializer is > passed in. AFAICT a custom serializer is passed only from > DiskStore.getValues, and that, on the other hand, depends on the serializer > passed to ExternalSorter. I don't know how often this is used, but I think > this can still be a good solution for the standard use case. > Oh, and also - ExternalSorter already has a SerializerInstance, so if the > getValues method is called from a single thread, maybe we can pass that > directly? > I'd be happy to try a patch but would probably need a confirmation from > someone that this approach would indeed work (or an idea for another). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org