[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?

2022-04-13 Thread Danny Guinther (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521852#comment-17521852
 ] 

Danny Guinther commented on SPARK-38792:


I'm getting the impression that the problem may be with some code that 
Databricks bolts on to Spark. I'd say ignore this ticket unless you hear 
otherwise.

> Regression in time executor takes to do work sometime after v3.0.1 ?
> 
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: dummy-job-job.jpg, dummy-job-query.png, 
> executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg, 
> executor-timing-debug-number-5.jpg, min-time-way-up.jpg, 
> what-is-this-code.jpg, what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?

2022-04-12 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-38792:
---
Attachment: what-is-this-code.jpg

> Regression in time executor takes to do work sometime after v3.0.1 ?
> 
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: dummy-job-job.jpg, dummy-job-query.png, 
> executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg, 
> executor-timing-debug-number-5.jpg, min-time-way-up.jpg, 
> what-is-this-code.jpg, what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?

2022-04-12 Thread Danny Guinther (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521342#comment-17521342
 ] 

Danny Guinther commented on SPARK-38792:


Where does org.apache.spark.sql.execution.collect.Collector live? I can't find 
it an New Relic suggests that the problem may stem from some classes in 
org.apache.spark.sql.execution.collect.*

 

See attached screenshot named what-is-this-code.jpg

> Regression in time executor takes to do work sometime after v3.0.1 ?
> 
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: dummy-job-job.jpg, dummy-job-query.png, 
> executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg, 
> executor-timing-debug-number-5.jpg, min-time-way-up.jpg, 
> what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?

2022-04-07 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-38792:
---
Attachment: executor-timing-debug-number-4.jpg

> Regression in time executor takes to do work sometime after v3.0.1 ?
> 
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: dummy-job-job.jpg, dummy-job-query.png, 
> executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg, 
> executor-timing-debug-number-5.jpg, min-time-way-up.jpg, 
> what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?

2022-04-07 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-38792:
---
Attachment: executor-timing-debug-number-2.jpg

> Regression in time executor takes to do work sometime after v3.0.1 ?
> 
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: dummy-job-job.jpg, dummy-job-query.png, 
> executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg, 
> executor-timing-debug-number-5.jpg, min-time-way-up.jpg, 
> what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?

2022-04-07 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-38792:
---
Attachment: executor-timing-debug-number-5.jpg

> Regression in time executor takes to do work sometime after v3.0.1 ?
> 
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: dummy-job-job.jpg, dummy-job-query.png, 
> executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg, 
> executor-timing-debug-number-5.jpg, min-time-way-up.jpg, 
> what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?

2022-04-07 Thread Danny Guinther (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518998#comment-17518998
 ] 

Danny Guinther commented on SPARK-38792:


I added yet another kind of dummy job that aims to measure:
 # Time between driver preparing data frame to read and adding a transform to 
said dataframe
 # Time between the driver adding a literal column to the dataframe and the 
executor seeing that literal column
 # Time between the executor seeing the literal column and adding another 
column via udf
 # Time between the executor adding another column via udf and control being 
returned to the driver
 # Time between the driver calling first on the dataframe and control returning 
to the driver

 

The code looks something like this:
{code:java}
  val RecordSchema = StructType(Seq(
    StructField("driverReadEpochMillis", LongType, false)
  ))  def mkSourceDataFrame(sparkSession: SparkSession): DataFrame = {
    sparkSession.createDataFrame(
      
sparkSession.sparkContext.makeRDD(Seq[Row](Row(Instant.now.toEpochMilli)), 1),
      RecordSchema
    )
  }

val df = mkSourceDataFrame(sparkSession)
  .withColumn("driverTransformEpochMillis", lit(Instant.now.toEpochMilli))
  .withColumn("executorTransformEpochMillis", (udf { () => 
Instant.now.toEpochMilli })())
  .withColumn("executorTransformAgainEpochMillis", (udf { () => 
Instant.now.toEpochMilli })())

val count = df.count
val beforeFirstEpochMillis = Instant.now.toEpochMilli
val row = df.first
val afterFirstEpochMillis = Instant.now.toEpochMilli
val driverReadEpochMillis = row.getAs[Long]("driverReadEpochMillis")
val driverTransformEpochMillis = row.getAs[Long]("driverTransformEpochMillis")
val executorTransformEpochMillis = 
row.getAs[Long]("executorTransformEpochMillis")
val executorTransformAgainEpochMillis = 
row.getAs[Long]("executorTransformAgainEpochMillis")
val statsEvent = Map[String, Any](
  "event" -> "executor-timing-debug",
  "driverReadEpochMillis" -> driverReadEpochMillis,
  "driverTransformEpochMillis" -> driverTransformEpochMillis,
  "driverReadToDriverTransformDeltaMillis" -> (driverTransformEpochMillis - 
driverReadEpochMillis), // #1
  "executorTransformEpochMillis" -> executorTransformEpochMillis,
  "driverTransformToExecutorTransformDeltaMillis" -> 
(executorTransformEpochMillis - driverTransformEpochMillis), // #2
  "executorTransformAgainEpochMillis" -> executorTransformAgainEpochMillis,
  "executorTransformToExecutorTransformAgainDeltaMillis" -> 
(executorTransformAgainEpochMillis - executorTransformEpochMillis), // #3
  "driverBeforeFirstEpochMillis" -> beforeFirstEpochMillis,
  "executorTransformAgainToDriverBeforeFirstDeltaMillis" -> 
(beforeFirstEpochMillis - executorTransformAgainEpochMillis), //#4
  "driverAfterFirstEpochMillis" -> afterFirstEpochMillis,
  "driverBeforeFirstEpochMillisToDriverAfterFirstDeltaMillis" -> 
(afterFirstEpochMillis - beforeFirstEpochMillis) // #5
)
 {code}
 

I think the results of running this job on Spark 3.0.1 vs. Spark 3.2.1 help 
narrow down the source of the problem. The results are as follows:
 # Time between driver preparing data frame to read and adding a transform to 
said dataframe
 ## NO CHANGE
 # Time between the driver adding a literal column to the dataframe and the 
executor seeing that literal column
 ## DEFINITE REGRESSION: See attached screenshot named 
executor-timing-debug-number-2.jpg
 # Time between the executor seeing the literal column and adding another 
column via udf
 ## NO CHANGE
 # Time between the executor adding another column via udf and control being 
returned to the driver
 ## DEFINITE REGRESSION: See attached screenshot named 
executor-timing-debug-number-4.jpg
 ## I renamed some metrics so names are a little inconsistent with the example 
code above, but I assure you this is the right data. I added 
executorTransformAgainEpochMillis after the fact so had to rename this metric 
then. This screenshot is from before the metric rename.
 # Time between the driver calling first on the dataframe and control returning 
to the driver
 ## DEFINITE REGRESSION: See attached screenshot named 
executor-timing-debug-number-5.jpg

 

The commonality between all the metrics that suffered regressions are they are 
all points where control transfers from the driver to the executor and/or when 
control transfers from the executor back to the driver.

I wonder if this is something that I can replicate on versions of Spark between 
3.0.1 and 3.2.1 despite the bottleneck caused by 
https://issues.apache.org/jira/browse/SPARK-37391 ; I'll give it a try.

 

Is there anybody out there?

 

> Regression in time executor takes to do work sometime after v3.0.1 ?
> 
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Is

[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?

2022-04-06 Thread Danny Guinther (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518460#comment-17518460
 ] 

Danny Guinther commented on SPARK-38792:


I added another kind of dummy job that doesn't farm work out to the executors 
at all and instead runs entirely on the driver while exercising most of the 
code paths that a normal data flow would. Interestingly, it seems unimpacted by 
the upgrade from 3.0.1 to 3.2.1 which suggests that the issue is strongly 
related to passing work to the executor or to the executor doing work. I'd be 
interested in ideas that might help me distinguish if the problem is
 # Driver sending work to the executor
 # Executor scheduling work
 # Executor performing work
 # Executor returning control to the driver

I'm not really sure how to exercise these different execution paths.

> Regression in time executor takes to do work sometime after v3.0.1 ?
> 
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: dummy-job-job.jpg, dummy-job-query.png, 
> min-time-way-up.jpg, what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?

2022-04-06 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-38792:
---
Summary: Regression in time executor takes to do work sometime after v3.0.1 
?  (was: Regression in time executor takes to do work since v3.0.1 ?)

> Regression in time executor takes to do work sometime after v3.0.1 ?
> 
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: dummy-job-job.jpg, dummy-job-query.png, 
> min-time-way-up.jpg, what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

2022-04-06 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-38792:
---
Attachment: dummy-job-query.png

> Regression in time executor takes to do work since v3.0.1 ?
> ---
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: dummy-job-job.jpg, dummy-job-query.png, 
> min-time-way-up.jpg, what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

2022-04-06 Thread Danny Guinther (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518323#comment-17518323
 ] 

Danny Guinther commented on SPARK-38792:


Things move through the Spark UI for my application too fast to dwell on any 
one thing, but I happened to catch an execution of the dummy job and I'm 
shocked by the durations that I happened to catch. I don't get what's going on 
and the metrics in the UI aren't offering much help. I've attached a screenshot 
of the job page as dummy-job-job.png and I've attached a screenshot of the 
query related to that job as dummy-job-query.png.

 

> Regression in time executor takes to do work since v3.0.1 ?
> ---
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: dummy-job-job.jpg, min-time-way-up.jpg, 
> what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

2022-04-06 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-38792:
---
Attachment: dummy-job-job.jpg

> Regression in time executor takes to do work since v3.0.1 ?
> ---
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: dummy-job-job.jpg, min-time-way-up.jpg, 
> what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

2022-04-06 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-38792:
---
Attachment: min-time-way-up.jpg

> Regression in time executor takes to do work since v3.0.1 ?
> ---
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: min-time-way-up.jpg, what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

2022-04-06 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-38792:
---
Attachment: (was: min-time-way-up.jpg)

> Regression in time executor takes to do work since v3.0.1 ?
> ---
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

2022-04-06 Thread Danny Guinther (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518273#comment-17518273
 ] 

Danny Guinther commented on SPARK-38792:


To further try to understand what is going on here, I created a very minimal 
dummy data flow that aims to eliminate more variables as to what could be wrong 
with my new Spark 3.2.1 deployment.

Instead of reading from a DB, the dummy dataframe is codified like so:
{code:java}
val Things = Seq[Row](
  Row("----", "Thing 0"),
  Row("----0001", "Thing 1"),
  Row("----0002", "Thing 2"),
  Row("----0003", "Thing 3"),
  Row("----0004", "Thing 4"),
  Row("----0005", "Thing 5"),
  Row("----0006", "Thing 6"),
  Row("----0007", "Thing 7"),
  Row("----0008", "Thing 8"),
  Row("----0009", "Thing 9"),
  Row("----000a", "Thing a"),
  Row("----000b", "Thing b"),
  Row("----000c", "Thing c"),
  Row("----000d", "Thing d"),
  Row("----000e", "Thing e"),
  Row("----000f", "Thing f")
)

private var sourceDataFrame: Option[DataFrame] = None

def mkSourceDataFrame(sparkSession: SparkSession): DataFrame = {
  sourceDataFrame match {
    case Some(srcDf) => srcDf
    case None =>
      val srcDf = sparkSession.createDataFrame(
        sparkSession.sparkContext.makeRDD(Things, 1),
        RecordSchema
      )
      sourceDataFrame = Some(srcDf)
      srcDf
  }
}{code}
>From there, I do a single simple transformation and then perform a count 
>action roughly like so:
{code:java}
mkSourceDataFrame(sparkSession)
  .withColumn("random", lit(UUID.randomUUID.toString))
  .count(){code}
Even in this very simple case, I am seeing a drastic increase in the time taken 
to complete the job when upgrading from Spark 3.0.1 to Spark 3.2.1. The 
difference in the minimum time required is especially noteworthy. Please see 
the attached screenshot named min-time-way-up.jpg for a visual of the 
difference.

 

Help?

 

 

> Regression in time executor takes to do work since v3.0.1 ?
> ---
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: min-time-way-up.jpg, what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark

[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

2022-04-06 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-38792:
---
Attachment: min-time-way-up.jpg

> Regression in time executor takes to do work since v3.0.1 ?
> ---
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: min-time-way-up.jpg, what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers. The 
> specific actions I'm invoking are: count() (but there's some transforming and 
> caching going on, so it's really more than that); first(); and write().
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

2022-04-05 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-38792:
---
Description: 
Hello!

I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
don't believe it is specific to my application since the upgrade to 3.0.1 to 
3.2.1 is purely a configuration change. I'd guess it presents itself in my 
application due to the high volume of work my application does, but I could be 
mistaken.

The gist is that it seems like the executor actions I'm running suddenly appear 
to take a lot longer on Spark 3.2.1. I don't have any ability to test versions 
between 3.0.1 and 3.2.1 because my application was previously blocked from 
upgrading beyond Spark 3.0.1 by 
https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).

Any ideas what might cause this or metrics I might try to gather to pinpoint 
the problem? I've tried a bunch of the suggestions from 
[https://spark.apache.org/docs/latest/tuning.html] to see if any of those help, 
but none of the adjustments I've tried have been fruitful. I also tried to look 
in [https://spark.apache.org/docs/latest/sql-migration-guide.html] for ideas as 
to what might have changed to cause this behavior, but haven't seen anything 
that sticks out as being a possible source of the problem.

I have attached a graph that shows the drastic change in time taken by executor 
actions. In the image the blue and purple lines are different kinds of reads 
using the built-in JDBC data reader and the green line is writes using a 
custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 occurred at 
9AM on the graph. The graph data comes from timing blocks that surround only 
the calls to dataframe actions, so there shouldn't be anything specific to my 
application that is suddenly inflating these numbers. The specific actions I'm 
invoking are: count() (but there's some transforming and caching going on, so 
it's really more than that); first(); and write().

The driver process does seem to be seeing more GC churn then with Spark 3.0.1, 
but I don't think that explains this behavior. The executors don't seem to have 
any problem with memory or GC and are not overutilized (our pipeline is very 
read and write heavy, less heavy on transformations, so executors tend to be 
idle while waiting for various network I/O).

 

Thanks in advance for any help!

  was:
Hello!

I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
don't believe it is specific to my application since the upgrade to 3.0.1 to 
3.2.1 is purely a configuration change. I'd guess it presents itself in my 
application due to the high volume of work my application does, but I could be 
mistaken.

The gist is that it seems like the executor actions I'm running suddenly appear 
to take a lot longer on Spark 3.2.1. I don't have any ability to test versions 
between 3.0.1 and 3.2.1 because my application was previously blocked from 
upgrading beyond Spark 3.0.1 by 
https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).

Any ideas what might cause this or metrics I might try to gather to pinpoint 
the problem? I've tried a bunch of the suggestions from 
[https://spark.apache.org/docs/latest/tuning.html] to see if any of those help, 
but none of the adjustments I've tried have been fruitful. I also tried to look 
in [https://spark.apache.org/docs/latest/sql-migration-guide.html] for ideas as 
to what might have changed to cause this behavior, but haven't seen anything 
that sticks out as being a possible source of the problem.

I have attached a graph that shows the drastic change in time taken by executor 
actions. In the image the blue and purple lines are different kinds of reads 
using the built-in JDBC data reader and the green line is writes using a 
custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 occurred at 
9AM on the graph. The graph data comes from timing blocks that surround only 
the calls to dataframe actions, so there shouldn't be anything specific to my 
application that is suddenly inflating these numbers.

The driver process does seem to be seeing more GC churn then with Spark 3.0.1, 
but I don't think that explains this behavior. The executors don't seem to have 
any problem with memory or GC and are not overutilized (our pipeline is very 
read and write heavy, less heavy on transformations, so executors tend to be 
idle while waiting for various network I/O).

 

Thanks in advance for any help!


> Regression in time executor takes to do work since v3.0.1 ?
> ---
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK

[jira] [Comment Edited] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

2022-04-05 Thread Danny Guinther (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517704#comment-17517704
 ] 

Danny Guinther edited comment on SPARK-38792 at 4/5/22 9:09 PM:


I don't know if it is helpful, but the runtime environment that the application 
is running in is a hosted Databricks workspace running in Azure.

I have tried deploying the upgrade to 3.2.1 several times in the last month and 
it behaves this way every time, so this is not just a fluke of bad timing.

I also tried adding 8 more executors w/ 4 cores each to try to help speed 
things up and this had no obvious impact on throughput.


was (Author: danny-seismic):
I don't know if it is helpful, but the runtime environment that the application 
is running in is a hosted Databricks workspace running in Azure.

I have tried deploying the upgrade to 3.2.1 several times in the last month and 
it behaves this way every time, so this is not just a fluke of bad timing.

> Regression in time executor takes to do work since v3.0.1 ?
> ---
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers.
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

2022-04-05 Thread Danny Guinther (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517704#comment-17517704
 ] 

Danny Guinther commented on SPARK-38792:


I don't know if it is helpful, but the runtime environment that the application 
is running in is a hosted Databricks workspace running in Azure.

I have tried deploying the upgrade to 3.2.1 several times in the last month and 
it behaves this way every time, so this is not just a fluke of bad timing.

> Regression in time executor takes to do work since v3.0.1 ?
> ---
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph. The graph data comes from timing blocks that 
> surround only the calls to dataframe actions, so there shouldn't be anything 
> specific to my application that is suddenly inflating these numbers.
> The driver process does seem to be seeing more GC churn then with Spark 
> 3.0.1, but I don't think that explains this behavior. The executors don't 
> seem to have any problem with memory or GC and are not overutilized (our 
> pipeline is very read and write heavy, less heavy on transformations, so 
> executors tend to be idle while waiting for various network I/O).
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

2022-04-05 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-38792:
---
Description: 
Hello!

I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
don't believe it is specific to my application since the upgrade to 3.0.1 to 
3.2.1 is purely a configuration change. I'd guess it presents itself in my 
application due to the high volume of work my application does, but I could be 
mistaken.

The gist is that it seems like the executor actions I'm running suddenly appear 
to take a lot longer on Spark 3.2.1. I don't have any ability to test versions 
between 3.0.1 and 3.2.1 because my application was previously blocked from 
upgrading beyond Spark 3.0.1 by 
https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).

Any ideas what might cause this or metrics I might try to gather to pinpoint 
the problem? I've tried a bunch of the suggestions from 
[https://spark.apache.org/docs/latest/tuning.html] to see if any of those help, 
but none of the adjustments I've tried have been fruitful. I also tried to look 
in [https://spark.apache.org/docs/latest/sql-migration-guide.html] for ideas as 
to what might have changed to cause this behavior, but haven't seen anything 
that sticks out as being a possible source of the problem.

I have attached a graph that shows the drastic change in time taken by executor 
actions. In the image the blue and purple lines are different kinds of reads 
using the built-in JDBC data reader and the green line is writes using a 
custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 occurred at 
9AM on the graph. The graph data comes from timing blocks that surround only 
the calls to dataframe actions, so there shouldn't be anything specific to my 
application that is suddenly inflating these numbers.

The driver process does seem to be seeing more GC churn then with Spark 3.0.1, 
but I don't think that explains this behavior. The executors don't seem to have 
any problem with memory or GC and are not overutilized (our pipeline is very 
read and write heavy, less heavy on transformations, so executors tend to be 
idle while waiting for various network I/O).

 

Thanks in advance for any help!

  was:
Hello!

I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
don't believe it is specific to my application since the upgrade to 3.0.1 to 
3.2.1 is purely a configuration change. I'd guess it presents itself in my 
application due to the high volume of work my application does, but I could be 
mistaken.

The gist is that it seems like the executor actions I'm running suddenly appear 
to take a lot longer on Spark 3.2.1. I don't have any ability to test versions 
between 3.0.1 and 3.2.1 because my application was previously blocked from 
upgrading beyond Spark 3.0.1 by 
https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).

Any ideas what might cause this or metrics I might try to gather to pinpoint 
the problem? I've tried a bunch of the suggestions from 
[https://spark.apache.org/docs/latest/tuning.html] to see if any of those help, 
but none of the adjustments I've tried have been fruitful. I also tried to look 
in [https://spark.apache.org/docs/latest/sql-migration-guide.html] for ideas as 
to what might have changed to cause this behavior, but haven't seen anything 
that sticks out as being a possible source of the problem.

I have attached a graph that shows the drastic change in time taken by executor 
actions. In the image the blue and purple lines are different kinds of reads 
using the built-in JDBC data reader and the green line is writes using a 
custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 occurred at 
9AM on the graph.

 

Thanks in advance for any help!


> Regression in time executor takes to do work since v3.0.1 ?
> ---
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the exe

[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

2022-04-05 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-38792:
---
Attachment: what-s-up-with-exec-actions.jpg

> Regression in time executor takes to do work since v3.0.1 ?
> ---
>
> Key: SPARK-38792
> URL: https://issues.apache.org/jira/browse/SPARK-38792
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Danny Guinther
>Priority: Major
> Attachments: what-s-up-with-exec-actions.jpg
>
>
> Hello!
> I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
> performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
> don't believe it is specific to my application since the upgrade to 3.0.1 to 
> 3.2.1 is purely a configuration change. I'd guess it presents itself in my 
> application due to the high volume of work my application does, but I could 
> be mistaken.
> The gist is that it seems like the executor actions I'm running suddenly 
> appear to take a lot longer on Spark 3.2.1. I don't have any ability to test 
> versions between 3.0.1 and 3.2.1 because my application was previously 
> blocked from upgrading beyond Spark 3.0.1 by 
> https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).
> Any ideas what might cause this or metrics I might try to gather to pinpoint 
> the problem? I've tried a bunch of the suggestions from 
> [https://spark.apache.org/docs/latest/tuning.html] to see if any of those 
> help, but none of the adjustments I've tried have been fruitful. I also tried 
> to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] 
> for ideas as to what might have changed to cause this behavior, but haven't 
> seen anything that sticks out as being a possible source of the problem.
> I have attached a graph that shows the drastic change in time taken by 
> executor actions. In the image the blue and purple lines are different kinds 
> of reads using the built-in JDBC data reader and the green line is writes 
> using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 
> occurred at 9AM on the graph.
>  
> Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?

2022-04-05 Thread Danny Guinther (Jira)
Danny Guinther created SPARK-38792:
--

 Summary: Regression in time executor takes to do work since v3.0.1 
?
 Key: SPARK-38792
 URL: https://issues.apache.org/jira/browse/SPARK-38792
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.1
Reporter: Danny Guinther
 Attachments: what-s-up-with-exec-actions.jpg

Hello!

I'm sorry to trouble you with this, but I'm seeing a noticeable regression in 
performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I 
don't believe it is specific to my application since the upgrade to 3.0.1 to 
3.2.1 is purely a configuration change. I'd guess it presents itself in my 
application due to the high volume of work my application does, but I could be 
mistaken.

The gist is that it seems like the executor actions I'm running suddenly appear 
to take a lot longer on Spark 3.2.1. I don't have any ability to test versions 
between 3.0.1 and 3.2.1 because my application was previously blocked from 
upgrading beyond Spark 3.0.1 by 
https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix).

Any ideas what might cause this or metrics I might try to gather to pinpoint 
the problem? I've tried a bunch of the suggestions from 
[https://spark.apache.org/docs/latest/tuning.html] to see if any of those help, 
but none of the adjustments I've tried have been fruitful. I also tried to look 
in [https://spark.apache.org/docs/latest/sql-migration-guide.html] for ideas as 
to what might have changed to cause this behavior, but haven't seen anything 
that sticks out as being a possible source of the problem.

I have attached a graph that shows the drastic change in time taken by executor 
actions. In the image the blue and purple lines are different kinds of reads 
using the built-in JDBC data reader and the green line is writes using a 
custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 occurred at 
9AM on the graph.

 

Thanks in advance for any help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001

2021-12-22 Thread Danny Guinther (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464042#comment-17464042
 ] 

Danny Guinther commented on SPARK-37391:


I've created three PRs to facilitate backporting this change to 3.1.3:


master PR: [https://github.com/apache/spark/pull/34745]

branch-3.1 PR: [https://github.com/apache/spark/pull/34988]

branch-3.2 PR: [https://github.com/apache/spark/pull/34989] 

> SIGNIFICANT bottleneck introduced by fix for SPARK-32001
> 
>
> Key: SPARK-37391
> URL: https://issues.apache.org/jira/browse/SPARK-37391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0
> Environment: N/A
>Reporter: Danny Guinther
>Priority: Major
> Attachments: so-much-blocking.jpg, spark-regression-dashes.jpg
>
>
> The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( 
> [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58]
>  ) does not seem to have consider the reality that some apps may rely on 
> being able to establish many JDBC connections simultaneously for performance 
> reasons.
> The fix forces concurrency to 1 when establishing database connections and 
> that strikes me as a *significant* user impacting change and a *significant* 
> bottleneck.
> Can anyone propose a workaround for this? I have an app that makes 
> connections to thousands of databases and I can't upgrade to any version 
> >3.1.x because of this significant bottleneck.
>  
> Thanks in advance for your help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37715) Docker integration tests: Tweak docs and remove unneeded dependency

2021-12-22 Thread Danny Guinther (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463879#comment-17463879
 ] 

Danny Guinther commented on SPARK-37715:


PR here: https://github.com/apache/spark/pull/34979

> Docker integration tests: Tweak docs and remove unneeded dependency
> ---
>
> Key: SPARK-37715
> URL: https://issues.apache.org/jira/browse/SPARK-37715
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Danny Guinther
>Priority: Trivial
>
> These are a couple of changes I found worthwhile while running docker 
> integration tests for [#34745|https://github.com/apache/spark/pull/34745].
> The doc changes are minor fixes to add the missing repository to the 
> suggested command to run the docker integration tests.
> The library change relates to this comment: [#34745 
> (comment)|https://github.com/apache/spark/pull/34745#discussion_r773084417]; 
> I don't know if my testing was thorough enough, but I found that the 
> referenced dependency was not needed and could be removed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37715) Docker integration tests: Tweak docs and remove unneeded dependency

2021-12-22 Thread Danny Guinther (Jira)
Danny Guinther created SPARK-37715:
--

 Summary: Docker integration tests: Tweak docs and remove unneeded 
dependency
 Key: SPARK-37715
 URL: https://issues.apache.org/jira/browse/SPARK-37715
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.3.0
Reporter: Danny Guinther


These are a couple of changes I found worthwhile while running docker 
integration tests for [#34745|https://github.com/apache/spark/pull/34745].

The doc changes are minor fixes to add the missing repository to the suggested 
command to run the docker integration tests.

The library change relates to this comment: [#34745 
(comment)|https://github.com/apache/spark/pull/34745#discussion_r773084417]; I 
don't know if my testing was thorough enough, but I found that the referenced 
dependency was not needed and could be removed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001

2021-11-22 Thread Danny Guinther (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447529#comment-17447529
 ] 

Danny Guinther commented on SPARK-37391:


Here's an example stacktrace for one of the blocked threads:

{{org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProviderBase.create(ConnectionProvider.scala:92)}}
{{org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:63)}}
{{org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$$Lambda$6294/1994845663.apply(Unknown
 Source)}}
{{org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)}}
{{org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)}}
{{org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)}}
{{org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:390)}}
{{org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:444)}}
{{org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:400)}}
{{org.apache.spark.sql.DataFrameReader$$Lambda$6224/1118373872.apply(Unknown 
Source)}}
{{scala.Option.getOrElse(Option.scala:189)}}
{{org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:400)}}
{{org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:273)}}
{{}}
{{scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)}}
{{scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)}}
{{scala.concurrent.Future$$$Lambda$442/341778327.apply(Unknown Source)}}
{{scala.util.Success.$anonfun$map$1(Try.scala:255)}}
{{scala.util.Success.map(Try.scala:213)}}
{{scala.concurrent.Future.$anonfun$map$1(Future.scala:292)}}
{{scala.concurrent.Future$$Lambda$443/424848797.apply(Unknown Source)}}
{{scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)}}
{{scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)}}
{{scala.concurrent.impl.Promise$$Lambda$444/1710905079.apply(Unknown Source)}}
{{scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)}}
{{java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
{{java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
{{java.lang.Thread.run(Thread.java:748)}}

 

 

The stacktrace from the thread that is holding the lock looks like so:

{{java.net.SocketInputStream.socketRead0(Native Method)}}
{{java.net.SocketInputStream.socketRead(SocketInputStream.java:116)}}
{{java.net.SocketInputStream.read(SocketInputStream.java:171)}}
{{java.net.SocketInputStream.read(SocketInputStream.java:141)}}
{{com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.readInternal(IOBuffer.java:1019)}}
{{com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.read(IOBuffer.java:1009)}}
{{sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:476)}}
{{sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:470)}}
{{sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70)}}
{{sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1364)}}
{{sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)}}
{{sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:973)}}
{{com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2058)}}
{{com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6617) => 
holding Monitor(com.microsoft.sqlserver.jdbc.TDSReader@1035497922})}}
{{com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7805)}}
{{com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7768)}}
{{com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:5332)}}
{{com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:4066)}}
{{com.microsoft.sqlserver.jdbc.SQLServerConnection.access$000(SQLServerConnection.java:85)}}
{{com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:4004)}}
{{com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7418)}}
{{com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3272)
 => holding Monitor(java.lang.Object@564746804})}}
{{com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2768)}}
{{com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2418)}}
{{com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:2265)}}
{{com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1291)}}
{{com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:881)}}
{{org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(

[jira] [Commented] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001

2021-11-22 Thread Danny Guinther (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447512#comment-17447512
 ] 

Danny Guinther commented on SPARK-37391:


[~hyukjin.kwon] , sorry, I seem to have gotten confused when identifying the 
source of the regression. I have updated the title and description to reflect 
the true source of the issue. I'm inclined to blame this change: 
https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58

 

I'm sorry, but I don't have the capacity to provide a self-contained 
reproduction of the issue. Hopefully the problem is obvious enough that you 
will be able to see what is going on from the anecdotal evidence I can provide.

The introduction of SecurityConfigurationLock.synchronized prevents a given 
JDBC Driver from establishing more than one connection at a time (or at least 
severely limits the concurrency). This is a significant bottleneck for 
applications that use a single JDBC driver to establish many database 
connections.

The anecdotal evidence I can offer to support this claim:

1. I've attached a screenshot of some dashboards we use to monitor the QA 
deployment of the application in question. These graphs come from a 4.5 hour 
window where I had spark 3.1.2 deployed to QA. On the left side of the graph we 
were running Spark 2.4.5; in the middle we were running spark 3.1.2; and on the 
right side of the graph we are running spark 3.0.1.
 # The "Success Rate", "CountActiveTasks", "CountActiveJobs", 
"CountTableTenantJobStart", "CountTableTenantJobEnd" graphs all aim to 
demonstrate that with the deployment of spark 3.1.2 the throughput of the 
application was significantly reduced across the board.
 # The "Overall Active Thread Count", "Count Active Executors", and 
"CountDeadExecutors" graphs all aim to evidence that there was no change in the 
number of resources allocated to do work.
 # The "Max MinsSinceLastAttempt" graph should normally be a flat line unless 
the application is falling behind on the work that it is scheduled to do. It 
can be seen during the period of the spark 3.1.2 deployment the application is 
falling behind at a linear rate and begins to recover once spark 3.0.1 is 
deployed.

!spark-regression-dashes.jpg!

 

2. I've attached a screenshot of the thread dump from the spark driver process. 
It can be seen that many, many threads are blocked waiting for 
SecurityConfigurationLock. The screenshot only shows a handful of threads but 
there are 98 threads in total blocked wiating for the SecurityConfigurationLock.

!so-much-blocking.jpg!

 

It's worth noting that our QA deployment does significantly less work than our 
production deployment; if the QA deployment can't keep up then the production 
deployment has no chance. On the bright side, I had success updating the 
production deployment to spark 3.0.1 and that seems to be stable. 
Unfortunately, we use Databricks for our spark vendor and the LTS release they 
have that supports spark 3.0.1 is only scheduled to be maintained until 
September 2022, so we can't avoid this regression forever.

 

If I can answer any questions or provide any more info, please let me know. 
Thanks in advance!

 

> SIGNIFICANT bottleneck introduced by fix for SPARK-32001
> 
>
> Key: SPARK-37391
> URL: https://issues.apache.org/jira/browse/SPARK-37391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0
> Environment: N/A
>Reporter: Danny Guinther
>Priority: Major
> Attachments: so-much-blocking.jpg, spark-regression-dashes.jpg
>
>
> The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( 
> [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58]
>  ) does not seem to have consider the reality that some apps may rely on 
> being able to establish many JDBC connections simultaneously for performance 
> reasons.
> The fix forces concurrency to 1 when establishing database connections and 
> that strikes me as a *significant* user impacting change and a *significant* 
> bottleneck.
> Can anyone propose a workaround for this? I have an app that makes 
> connections to thousands of databases and I can't upgrade to any version 
> >3.1.x because of this significant bottleneck.
>  
> Thanks in advance for your help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001

2021-11-22 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-37391:
---
Attachment: so-much-blocking.jpg

> SIGNIFICANT bottleneck introduced by fix for SPARK-32001
> 
>
> Key: SPARK-37391
> URL: https://issues.apache.org/jira/browse/SPARK-37391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0
> Environment: N/A
>Reporter: Danny Guinther
>Priority: Major
> Attachments: so-much-blocking.jpg, spark-regression-dashes.jpg
>
>
> The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( 
> [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58]
>  ) does not seem to have consider the reality that some apps may rely on 
> being able to establish many JDBC connections simultaneously for performance 
> reasons.
> The fix forces concurrency to 1 when establishing database connections and 
> that strikes me as a *significant* user impacting change and a *significant* 
> bottleneck.
> Can anyone propose a workaround for this? I have an app that makes 
> connections to thousands of databases and I can't upgrade to any version 
> >3.1.x because of this significant bottleneck.
>  
> Thanks in advance for your help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001

2021-11-22 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-37391:
---
Attachment: spark-regression-dashes.jpg

> SIGNIFICANT bottleneck introduced by fix for SPARK-32001
> 
>
> Key: SPARK-37391
> URL: https://issues.apache.org/jira/browse/SPARK-37391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0
> Environment: N/A
>Reporter: Danny Guinther
>Priority: Major
> Attachments: spark-regression-dashes.jpg
>
>
> The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( 
> [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58]
>  ) does not seem to have consider the reality that some apps may rely on 
> being able to establish many JDBC connections simultaneously for performance 
> reasons.
> The fix forces concurrency to 1 when establishing database connections and 
> that strikes me as a *significant* user impacting change and a *significant* 
> bottleneck.
> Can anyone propose a workaround for this? I have an app that makes 
> connections to thousands of databases and I can't upgrade to any version 
> >3.1.x because of this significant bottleneck.
>  
> Thanks in advance for your help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001

2021-11-22 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-37391:
---
Summary: SIGNIFICANT bottleneck introduced by fix for SPARK-32001  (was: 
SIGNIFICANT bottleneck introduced by fix for SPARK-34497)

> SIGNIFICANT bottleneck introduced by fix for SPARK-32001
> 
>
> Key: SPARK-37391
> URL: https://issues.apache.org/jira/browse/SPARK-37391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0
> Environment: N/A
>Reporter: Danny Guinther
>Priority: Major
>
> The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( 
> [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58]
>  ) does not seem to have consider the reality that some apps may rely on 
> being able to establish many JDBC connections simultaneously for performance 
> reasons.
> The fix forces concurrency to 1 when establishing database connections and 
> that strikes me as a *significant* user impacting change and a *significant* 
> bottleneck.
> Can anyone propose a workaround for this? I have an app that makes 
> connections to thousands of databases and I can't upgrade to any version 
> >3.1.x because of this significant bottleneck.
>  
> Thanks in advance for your help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-34497

2021-11-22 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-37391:
---
Description: 
The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( 
[https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58]
 ) does not seem to have consider the reality that some apps may rely on being 
able to establish many JDBC connections simultaneously for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!

  was:
The fix for SPARK-34497 ( 
[https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58|https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58]
 ) does not seem to have consider the reality that some apps may rely on being 
able to establish many JDBC connections simultaneously for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!


> SIGNIFICANT bottleneck introduced by fix for SPARK-34497
> 
>
> Key: SPARK-37391
> URL: https://issues.apache.org/jira/browse/SPARK-37391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0
> Environment: N/A
>Reporter: Danny Guinther
>Priority: Major
>
> The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( 
> [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58]
>  ) does not seem to have consider the reality that some apps may rely on 
> being able to establish many JDBC connections simultaneously for performance 
> reasons.
> The fix forces concurrency to 1 when establishing database connections and 
> that strikes me as a *significant* user impacting change and a *significant* 
> bottleneck.
> Can anyone propose a workaround for this? I have an app that makes 
> connections to thousands of databases and I can't upgrade to any version 
> >3.1.x because of this significant bottleneck.
>  
> Thanks in advance for your help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-34497

2021-11-22 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-37391:
---
Description: 
The fix for SPARK-34497 ( 
[https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58|https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58]
 ) does not seem to have consider the reality that some apps may rely on being 
able to establish many JDBC connections simultaneously for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!

  was:
The fix for SPARK-34497 ( [https://github.com/apache/spark/pull/31622] ) does 
not seem to have consider the reality that some apps may rely on being able to 
establish many JDBC connections simultaneously for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!


> SIGNIFICANT bottleneck introduced by fix for SPARK-34497
> 
>
> Key: SPARK-37391
> URL: https://issues.apache.org/jira/browse/SPARK-37391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0
> Environment: N/A
>Reporter: Danny Guinther
>Priority: Major
>
> The fix for SPARK-34497 ( 
> [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58|https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58]
>  ) does not seem to have consider the reality that some apps may rely on 
> being able to establish many JDBC connections simultaneously for performance 
> reasons.
> The fix forces concurrency to 1 when establishing database connections and 
> that strikes me as a *significant* user impacting change and a *significant* 
> bottleneck.
> Can anyone propose a workaround for this? I have an app that makes 
> connections to thousands of databases and I can't upgrade to any version 
> >3.1.x because of this significant bottleneck.
>  
> Thanks in advance for your help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-34497

2021-11-19 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-37391:
---
Description: 
The fix for SPARK-34497 ( [https://github.com/apache/spark/pull/31622] ) does 
not seem to have consider the reality that some apps may rely on being able to 
establish many JDBC connections simultaneously for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!

  was:
The fix for SPARK-34497 ([https://github.com/apache/spark/pull/31622)] does not 
seem to have consider the reality that some apps may rely on being able to 
establish many JDBC connections simultaneously for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!


> SIGNIFICANT bottleneck introduced by fix for SPARK-34497
> 
>
> Key: SPARK-37391
> URL: https://issues.apache.org/jira/browse/SPARK-37391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0
> Environment: N/A
>Reporter: Danny Guinther
>Priority: Major
>
> The fix for SPARK-34497 ( [https://github.com/apache/spark/pull/31622] ) does 
> not seem to have consider the reality that some apps may rely on being able 
> to establish many JDBC connections simultaneously for performance reasons.
> The fix forces concurrency to 1 when establishing database connections and 
> that strikes me as a *significant* user impacting change and a *significant* 
> bottleneck.
> Can anyone propose a workaround for this? I have an app that makes 
> connections to thousands of databases and I can't upgrade to any version 
> >3.1.x because of this significant bottleneck.
>  
> Thanks in advance for your help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-34497

2021-11-19 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-37391:
---
Description: 
The fix for SPARK-34497 ([https://github.com/apache/spark/pull/31622)] does not 
seem to have consider the reality that some apps may rely on being able to 
establish many JDBC connections simultaneously for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!

  was:
The fix for SPARK-34497 does not seem to have consider the reality that some 
apps may rely on being able to establish many JDBC connections simultaneously 
for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!


> SIGNIFICANT bottleneck introduced by fix for SPARK-34497
> 
>
> Key: SPARK-37391
> URL: https://issues.apache.org/jira/browse/SPARK-37391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0
> Environment: N/A
>Reporter: Danny Guinther
>Priority: Major
>
> The fix for SPARK-34497 ([https://github.com/apache/spark/pull/31622)] does 
> not seem to have consider the reality that some apps may rely on being able 
> to establish many JDBC connections simultaneously for performance reasons.
> The fix forces concurrency to 1 when establishing database connections and 
> that strikes me as a *significant* user impacting change and a *significant* 
> bottleneck.
> Can anyone propose a workaround for this? I have an app that makes 
> connections to thousands of databases and I can't upgrade to any version 
> >3.1.x because of this significant bottleneck.
>  
> Thanks in advance for your help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-34497

2021-11-19 Thread Danny Guinther (Jira)
Danny Guinther created SPARK-37391:
--

 Summary: SIGNIFICANT bottleneck introduced by fix for SPARK-34497
 Key: SPARK-37391
 URL: https://issues.apache.org/jira/browse/SPARK-37391
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0, 3.1.2, 3.1.1, 3.1.0
 Environment: N/A
Reporter: Danny Guinther


The fix for SPARK-34497 does not seem to have consider the reality that some 
apps may rely on being able to establish many JDBC connections simultaneously 
for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19335) Spark should support doing an efficient DataFrame Upsert via JDBC

2019-04-25 Thread Danny Guinther (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826378#comment-16826378
 ] 

Danny Guinther commented on SPARK-19335:


Any update on this?

Also, please forgive this dumb question, but I'm shocked that there's not more 
demand for this feature which makes me wonder if I have major misconceptions 
about Spark and its intended use. How do users survive without this 
functionality? I take it that the destination SQL database should have flexible 
up-time requirements allowing for drastic changes? The overwrite save mode is 
the only thing that offers anything like an UPDATE, but totally 
dropping/truncating the destination table seems inconceivable for many 
production environments. What am I missing?

> Spark should support doing an efficient DataFrame Upsert via JDBC
> -
>
> Key: SPARK-19335
> URL: https://issues.apache.org/jira/browse/SPARK-19335
> Project: Spark
>  Issue Type: Improvement
>Reporter: Ilya Ganelin
>Priority: Minor
>
> Doing a database update, as opposed to an insert is useful, particularly when 
> working with streaming applications which may require revisions to previously 
> stored data. 
> Spark DataFrames/DataSets do not currently support an Update feature via the 
> JDBC Writer allowing only Overwrite or Append.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org