Shardul Mahadik created SPARK-36215: ---------------------------------------
Summary: Add logging for slow fetches to diagnose external shuffle service issues Key: SPARK-36215 URL: https://issues.apache.org/jira/browse/SPARK-36215 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 3.2.0 Reporter: Shardul Mahadik Currently we can see from the metrics that a task or stage has slow fetches, and the logs indicate _all_ of the shuffle servers those tasks were fetching from, but often this is a big set (dozens or even hundreds) and narrowing down which one caused issues can be very difficult. We should add some logging when a fetch is "slow" as determined by some preconfigured thresholds. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org