Hi. Just a few quick comment on your question.
If you drill into (click the link of the subtasks) you can get more detailed view of the tasks. One of the things reported is the time for serialization. If that is your dominant factor it should be reflected there, right? Are you sure the input data is not getting cached between runs (i.e. does the order of the experiments matter and did you explicitly flush the operation system memory between runs etc. etc.)? If you now run the old experiment again, does it take the same amount of time again? Did you validate that the results where actually correct? Hope this helps.. Regards, Gylfi. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-Kryo-Serializer-is-slower-than-Java-Serializer-in-TeraSort-tp23621p23659.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org