Andrew Or created SPARK-12473: --------------------------------- Summary: Reuse serializer instances for performance Key: SPARK-12473 URL: https://issues.apache.org/jira/browse/SPARK-12473 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Reporter: Andrew Or Assignee: Andrew Or
After commit de02782 of page rank regressed from 242s to 260s, about 7%. The commit added 26 types to register every time we create a Kryo serializer instance. I ran a small microbenchmark to prove that this is noticeably expensive: {code} import org.apache.spark.serializer._ import org.apache.spark.SparkConf def makeMany(num: Int): Long = { val start = System.currentTimeMillis (1 to num).foreach { _ => new KryoSerializer(new SparkConf).newKryo() } System.currentTimeMillis - start } // before commit de02782, averaged over multiple runs makeMany(5000) == 1500 // after commit de02782, averaged over multiple runs makeMany(5000) == 2750 {code} Since we create multiple serializer instances per partition, this means a 5000-partition stage will unconditionally see an increase of > 1s for the stage. In page rank, we may run many such stages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org