Hi All, I have a simple ETL job that reads some data, shuffles it and writes it back out. This is running on AWS EMR 5.4.0 using Spark 2.1.0.
After Stage 0 completes and the job starts Stage 1, I see a huge slowdown in the job. The CPU usage is low on the cluster, as is the network I/O. >From the Spark Stats, I see large values for the Shuffle Read Blocked Time. As an example, one of my tasks completed in 18 minutes, but spent 15 minutes waiting for remote reads. I'm not sure why the shuffle is so slow. Are there things I can do to increase the performance of the shuffle? Thanks, Pradeep