Question about Spark shuffle read size

Dogtail L Wed, 04 Nov 2015 17:33:25 -0800

Hi all,

When I run WordCount using Spark, I find that when I set
"spark.default.parallelism" to different numbers, the Shuffle Write size
and Shuffle Read size will change as well (I read these data from history
server's web UI). Is it because the shuffle write size also include some
metadata size?


Also, my input file for WordCount is approximately 3kB (stored in local
filesystem), and I partitioned it to 10 pieces using textFile function.
However, the web UI shows that WordCount's input data size is 19.5 kB, much
larger than input data size. Why would that happen? Great thanks!

Question about Spark shuffle read size

Reply via email to