Re: Why Kryo Serializer is slower than Java Serializer in TeraSort
Hi. Just a few quick comment on your question. If you drill into (click the link of the subtasks) you can get more detailed view of the tasks. One of the things reported is the time for serialization. If that is your dominant factor it should be reflected there, right? Are you sure the input data is not getting cached between runs (i.e. does the order of the experiments matter and did you explicitly flush the operation system memory between runs etc. etc.)? If you now run the old experiment again, does it take the same amount of time again? Did you validate that the results where actually correct? Hope this helps.. Regards, Gylfi. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-Kryo-Serializer-is-slower-than-Java-Serializer-in-TeraSort-tp23621p23659.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Why Kryo Serializer is slower than Java Serializer in TeraSort
Hi, I am using TeraSort benchmark from ehiggs's branch https://github.com/ehiggs/spark-terasort https://github.com/ehiggs/spark-terasort . Then I noticed that in TeraSort.scala, it is using Kryo Serializer. So I made a small change from org.apache.spark.serializer.KryoSerializer to org.apache.spark.serializer.JavaSerializer to see the time difference. Curiously, using Java Serializer is much quicker than using Kryo and there is no error reported when I run the program. Here is the record from history server, first one is kryo. second one is java default. 1. http://apache-spark-user-list.1001560.n3.nabble.com/file/n23621/kryo.png 2. http://apache-spark-user-list.1001560.n3.nabble.com/file/n23621/java.png I am wondering if I did something wrong or there is any other reason behind this result. Thanks for any help, Gavin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-Kryo-Serializer-is-slower-than-Java-Serializer-in-TeraSort-tp23621.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Why Kryo Serializer is slower than Java Serializer in TeraSort
Looks like, it spend more time writing/transferring the 40GB of shuffle when you used kryo. And surpirsingly, JavaSerializer has 700MB of shuffle? Thanks Best Regards On Sun, Jul 5, 2015 at 12:01 PM, Gavin Liu ilovesonsofanar...@gmail.com wrote: Hi, I am using TeraSort benchmark from ehiggs's branch https://github.com/ehiggs/spark-terasort https://github.com/ehiggs/spark-terasort . Then I noticed that in TeraSort.scala, it is using Kryo Serializer. So I made a small change from org.apache.spark.serializer.KryoSerializer to org.apache.spark.serializer.JavaSerializer to see the time difference. Curiously, using Java Serializer is much quicker than using Kryo and there is no error reported when I run the program. Here is the record from history server, first one is kryo. second one is java default. 1. http://apache-spark-user-list.1001560.n3.nabble.com/file/n23621/kryo.png 2. http://apache-spark-user-list.1001560.n3.nabble.com/file/n23621/java.png I am wondering if I did something wrong or there is any other reason behind this result. Thanks for any help, Gavin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-Kryo-Serializer-is-slower-than-Java-Serializer-in-TeraSort-tp23621.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Why Kryo Serializer is slower than Java Serializer in TeraSort
That code doesn't appear to be registering classes with Kryo, which means the fully-qualified classname is stored with every Kryo record. The Spark documentation has more on this: https://spark.apache.org/docs/latest/tuning.html#data-serialization Regards, Will On July 5, 2015, at 2:31 AM, Gavin Liu ilovesonsofanar...@gmail.com wrote: Hi, I am using TeraSort benchmark from ehiggs's branch https://github.com/ehiggs/spark-terasort https://github.com/ehiggs/spark-terasort . Then I noticed that in TeraSort.scala, it is using Kryo Serializer. So I made a small change from org.apache.spark.serializer.KryoSerializer to org.apache.spark.serializer.JavaSerializer to see the time difference. Curiously, using Java Serializer is much quicker than using Kryo and there is no error reported when I run the program. Here is the record from history server, first one is kryo. second one is java default. 1. http://apache-spark-user-list.1001560.n3.nabble.com/file/n23621/kryo.png 2. http://apache-spark-user-list.1001560.n3.nabble.com/file/n23621/java.png I am wondering if I did something wrong or there is any other reason behind this result. Thanks for any help, Gavin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-Kryo-Serializer-is-slower-than-Java-Serializer-in-TeraSort-tp23621.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org