[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

Danny Guinther (Jira) Tue, 05 Apr 2022 14:06:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Danny Guinther updated SPARK-38792:
-----------------------------------
    Description: 
Hello!

I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
don't believe it is specific to my application since the upgrade to 3.0.1 to 
3.2.1 is purely a configuration change. I'd guess it presents itself in my 
application due to the high volume of work my application does, but I could be 
mistaken.

The gist is that it seems like the executor actions I'm running suddenly appear 
to take a lot longer on Spark 3.2.1. I don't have any ability to test versions 
between 3.0.1 and 3.2.1 because my application was previously blocked from 
upgrading beyond Spark 3.0.1 by 
https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).

Any ideas what might cause this or metrics I might try to gather to pinpoint 
the problem? I've tried a bunch of the suggestions from 
[https://spark.apache.org/docs/latest/tuning.html] to see if any of those help, 
but none of the adjustments I've tried have been fruitful. I also tried to look 
in [https://spark.apache.org/docs/latest/sql-migration-guide.html] for ideas as 
to what might have changed to cause this behavior, but haven't seen anything 
that sticks out as being a possible source of the problem.

I have attached a graph that shows the drastic change in time taken by executor 
actions. In the image the blue and purple lines are different kinds of reads 
using the built-in JDBC data reader and the green line is writes using a 
custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 occurred at 
9AM on the graph. The graph data comes from timing blocks that surround only 
the calls to dataframe actions, so there shouldn't be anything specific to my 
application that is suddenly inflating these numbers.

The driver process does seem to be seeing more GC churn then with Spark 3.0.1, 
but I don't think that explains this behavior. The executors don't seem to have 
any problem with memory or GC and are not overutilized (our pipeline is very 
read and write heavy, less heavy on transformations, so executors tend to be 
idle while waiting for various network I/O).

 

Thanks in advance for any help!

  was:
Hello!

I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
don't believe it is specific to my application since the upgrade to 3.0.1 to 
3.2.1 is purely a configuration change. I'd guess it presents itself in my 
application due to the high volume of work my application does, but I could be 
mistaken.

The gist is that it seems like the executor actions I'm running suddenly appear 
to take a lot longer on Spark 3.2.1. I don't have any ability to test versions 
between 3.0.1 and 3.2.1 because my application was previously blocked from 
upgrading beyond Spark 3.0.1 by 
https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).

Any ideas what might cause this or metrics I might try to gather to pinpoint 
the problem? I've tried a bunch of the suggestions from 
[https://spark.apache.org/docs/latest/tuning.html] to see if any of those help, 
but none of the adjustments I've tried have been fruitful. I also tried to look 
in [https://spark.apache.org/docs/latest/sql-migration-guide.html] for ideas as 
to what might have changed to cause this behavior, but haven't seen anything 
that sticks out as being a possible source of the problem.

I have attached a graph that shows the drastic change in time taken by executor 
actions. In the image the blue and purple lines are different kinds of reads 
using the built-in JDBC data reader and the green line is writes using a 
custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 occurred at 
9AM on the graph.

 

Thanks in advance for any help!


> Regression in time executor takes to do work since v3.0.1 ?
> -----------------------------------------------------------
>
>                 Key: SPARK-38792
>                 URL: https://issues.apache.org/jira/browse/SPARK-38792
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.2.1
>            Reporter: Danny Guinther
>            Priority: Major
>         Attachments: what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers.
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

Reply via email to