Hi All, It appears that the bottleneck in my job was the EBS volumes. Very high i/o wait times across the cluster. I was only using 1 volume. Increasing to 4 made it faster.
Thanks, Pradeep On Thu, Apr 20, 2017 at 3:12 PM, Pradeep Gollakota <pradeep...@gmail.com> wrote: > Hi All, > > I have a simple ETL job that reads some data, shuffles it and writes it > back out. This is running on AWS EMR 5.4.0 using Spark 2.1.0. > > After Stage 0 completes and the job starts Stage 1, I see a huge slowdown > in the job. The CPU usage is low on the cluster, as is the network I/O. > From the Spark Stats, I see large values for the Shuffle Read Blocked Time. > As an example, one of my tasks completed in 18 minutes, but spent 15 > minutes waiting for remote reads. > > I'm not sure why the shuffle is so slow. Are there things I can do to > increase the performance of the shuffle? > > Thanks, > Pradeep >