[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Danny Guinther updated SPARK-38792: ----------------------------------- Attachment: executor-timing-debug-number-5.jpg > Regression in time executor takes to do work sometime after v3.0.1 ? > -------------------------------------------------------------------- > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.2.1 > Reporter: Danny Guinther > Priority: Major > Attachments: dummy-job-job.jpg, dummy-job-query.png, > executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg, > executor-timing-debug-number-5.jpg, min-time-way-up.jpg, > what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org