I am implementing wordcount on the spark cluster (1 master, 3 slaves) in
standalone mode. I have 546G data, and the dfs.blocksize I set is 256MB.
Therefore, the amount of tasks are 2186. My 3 slaves each uses 22 cores and
72 memory to do the processing, so the computing ability of each slave
Hi,
I am using TeraSort benchmark from ehiggs's branch
https://github.com/ehiggs/spark-terasort
https://github.com/ehiggs/spark-terasort . Then I noticed that in
TeraSort.scala, it is using Kryo Serializer. So I made a small change from
org.apache.spark.serializer.KryoSerializer to