Shardul Mahadik created SPARK-36215:
---------------------------------------

             Summary: Add logging for slow fetches to diagnose external shuffle 
service issues
                 Key: SPARK-36215
                 URL: https://issues.apache.org/jira/browse/SPARK-36215
             Project: Spark
          Issue Type: Improvement
          Components: Shuffle
    Affects Versions: 3.2.0
            Reporter: Shardul Mahadik


Currently we can see from the metrics that a task or stage has slow fetches, 
and the logs indicate _all_ of the shuffle servers those tasks were fetching 
from, but often this is a big set (dozens or even hundreds) and narrowing down 
which one caused issues can be very difficult. We should add some logging when 
a fetch is "slow" as determined by some preconfigured thresholds.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to