Hi all, When I run WordCount using Spark, I find that when I set "spark.default.parallelism" to different numbers, the Shuffle Write size and Shuffle Read size will change as well (I read these data from history server's web UI). Is it because the shuffle write size also include some metadata size?
Also, my input file for WordCount is approximately 3kB (stored in local filesystem), and I partitioned it to 10 pieces using textFile function. However, the web UI shows that WordCount's input data size is 19.5 kB, much larger than input data size. Why would that happen? Great thanks!