[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521852#comment-17521852 ] Danny Guinther commented on SPARK-38792: I'm getting the impression that the problem may be with some code that Databricks bolts on to Spark. I'd say ignore this ticket unless you hear otherwise. > Regression in time executor takes to do work sometime after v3.0.1 ? > > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: dummy-job-job.jpg, dummy-job-query.png, > executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg, > executor-timing-debug-number-5.jpg, min-time-way-up.jpg, > what-is-this-code.jpg, what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-38792: --- Attachment: what-is-this-code.jpg > Regression in time executor takes to do work sometime after v3.0.1 ? > > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: dummy-job-job.jpg, dummy-job-query.png, > executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg, > executor-timing-debug-number-5.jpg, min-time-way-up.jpg, > what-is-this-code.jpg, what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521342#comment-17521342 ] Danny Guinther commented on SPARK-38792: Where does org.apache.spark.sql.execution.collect.Collector live? I can't find it an New Relic suggests that the problem may stem from some classes in org.apache.spark.sql.execution.collect.* See attached screenshot named what-is-this-code.jpg > Regression in time executor takes to do work sometime after v3.0.1 ? > > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: dummy-job-job.jpg, dummy-job-query.png, > executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg, > executor-timing-debug-number-5.jpg, min-time-way-up.jpg, > what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-38792: --- Attachment: executor-timing-debug-number-4.jpg > Regression in time executor takes to do work sometime after v3.0.1 ? > > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: dummy-job-job.jpg, dummy-job-query.png, > executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg, > executor-timing-debug-number-5.jpg, min-time-way-up.jpg, > what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-38792: --- Attachment: executor-timing-debug-number-2.jpg > Regression in time executor takes to do work sometime after v3.0.1 ? > > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: dummy-job-job.jpg, dummy-job-query.png, > executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg, > executor-timing-debug-number-5.jpg, min-time-way-up.jpg, > what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-38792: --- Attachment: executor-timing-debug-number-5.jpg > Regression in time executor takes to do work sometime after v3.0.1 ? > > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: dummy-job-job.jpg, dummy-job-query.png, > executor-timing-debug-number-2.jpg, executor-timing-debug-number-4.jpg, > executor-timing-debug-number-5.jpg, min-time-way-up.jpg, > what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518998#comment-17518998 ] Danny Guinther commented on SPARK-38792: I added yet another kind of dummy job that aims to measure: # Time between driver preparing data frame to read and adding a transform to said dataframe # Time between the driver adding a literal column to the dataframe and the executor seeing that literal column # Time between the executor seeing the literal column and adding another column via udf # Time between the executor adding another column via udf and control being returned to the driver # Time between the driver calling first on the dataframe and control returning to the driver The code looks something like this: {code:java} val RecordSchema = StructType(Seq( StructField("driverReadEpochMillis", LongType, false) )) def mkSourceDataFrame(sparkSession: SparkSession): DataFrame = { sparkSession.createDataFrame( sparkSession.sparkContext.makeRDD(Seq[Row](Row(Instant.now.toEpochMilli)), 1), RecordSchema ) } val df = mkSourceDataFrame(sparkSession) .withColumn("driverTransformEpochMillis", lit(Instant.now.toEpochMilli)) .withColumn("executorTransformEpochMillis", (udf { () => Instant.now.toEpochMilli })()) .withColumn("executorTransformAgainEpochMillis", (udf { () => Instant.now.toEpochMilli })()) val count = df.count val beforeFirstEpochMillis = Instant.now.toEpochMilli val row = df.first val afterFirstEpochMillis = Instant.now.toEpochMilli val driverReadEpochMillis = row.getAs[Long]("driverReadEpochMillis") val driverTransformEpochMillis = row.getAs[Long]("driverTransformEpochMillis") val executorTransformEpochMillis = row.getAs[Long]("executorTransformEpochMillis") val executorTransformAgainEpochMillis = row.getAs[Long]("executorTransformAgainEpochMillis") val statsEvent = Map[String, Any]( "event" -> "executor-timing-debug", "driverReadEpochMillis" -> driverReadEpochMillis, "driverTransformEpochMillis" -> driverTransformEpochMillis, "driverReadToDriverTransformDeltaMillis" -> (driverTransformEpochMillis - driverReadEpochMillis), // #1 "executorTransformEpochMillis" -> executorTransformEpochMillis, "driverTransformToExecutorTransformDeltaMillis" -> (executorTransformEpochMillis - driverTransformEpochMillis), // #2 "executorTransformAgainEpochMillis" -> executorTransformAgainEpochMillis, "executorTransformToExecutorTransformAgainDeltaMillis" -> (executorTransformAgainEpochMillis - executorTransformEpochMillis), // #3 "driverBeforeFirstEpochMillis" -> beforeFirstEpochMillis, "executorTransformAgainToDriverBeforeFirstDeltaMillis" -> (beforeFirstEpochMillis - executorTransformAgainEpochMillis), //#4 "driverAfterFirstEpochMillis" -> afterFirstEpochMillis, "driverBeforeFirstEpochMillisToDriverAfterFirstDeltaMillis" -> (afterFirstEpochMillis - beforeFirstEpochMillis) // #5 ) {code} I think the results of running this job on Spark 3.0.1 vs. Spark 3.2.1 help narrow down the source of the problem. The results are as follows: # Time between driver preparing data frame to read and adding a transform to said dataframe ## NO CHANGE # Time between the driver adding a literal column to the dataframe and the executor seeing that literal column ## DEFINITE REGRESSION: See attached screenshot named executor-timing-debug-number-2.jpg # Time between the executor seeing the literal column and adding another column via udf ## NO CHANGE # Time between the executor adding another column via udf and control being returned to the driver ## DEFINITE REGRESSION: See attached screenshot named executor-timing-debug-number-4.jpg ## I renamed some metrics so names are a little inconsistent with the example code above, but I assure you this is the right data. I added executorTransformAgainEpochMillis after the fact so had to rename this metric then. This screenshot is from before the metric rename. # Time between the driver calling first on the dataframe and control returning to the driver ## DEFINITE REGRESSION: See attached screenshot named executor-timing-debug-number-5.jpg The commonality between all the metrics that suffered regressions are they are all points where control transfers from the driver to the executor and/or when control transfers from the executor back to the driver. I wonder if this is something that I can replicate on versions of Spark between 3.0.1 and 3.2.1 despite the bottleneck caused by https://issues.apache.org/jira/browse/SPARK-37391 ; I'll give it a try. Is there anybody out there? > Regression in time executor takes to do work sometime after v3.0.1 ? > > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Is
[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518460#comment-17518460 ] Danny Guinther commented on SPARK-38792: I added another kind of dummy job that doesn't farm work out to the executors at all and instead runs entirely on the driver while exercising most of the code paths that a normal data flow would. Interestingly, it seems unimpacted by the upgrade from 3.0.1 to 3.2.1 which suggests that the issue is strongly related to passing work to the executor or to the executor doing work. I'd be interested in ideas that might help me distinguish if the problem is # Driver sending work to the executor # Executor scheduling work # Executor performing work # Executor returning control to the driver I'm not really sure how to exercise these different execution paths. > Regression in time executor takes to do work sometime after v3.0.1 ? > > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: dummy-job-job.jpg, dummy-job-query.png, > min-time-way-up.jpg, what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work sometime after v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-38792: --- Summary: Regression in time executor takes to do work sometime after v3.0.1 ? (was: Regression in time executor takes to do work since v3.0.1 ?) > Regression in time executor takes to do work sometime after v3.0.1 ? > > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: dummy-job-job.jpg, dummy-job-query.png, > min-time-way-up.jpg, what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-38792: --- Attachment: dummy-job-query.png > Regression in time executor takes to do work since v3.0.1 ? > --- > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: dummy-job-job.jpg, dummy-job-query.png, > min-time-way-up.jpg, what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518323#comment-17518323 ] Danny Guinther commented on SPARK-38792: Things move through the Spark UI for my application too fast to dwell on any one thing, but I happened to catch an execution of the dummy job and I'm shocked by the durations that I happened to catch. I don't get what's going on and the metrics in the UI aren't offering much help. I've attached a screenshot of the job page as dummy-job-job.png and I've attached a screenshot of the query related to that job as dummy-job-query.png. > Regression in time executor takes to do work since v3.0.1 ? > --- > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: dummy-job-job.jpg, min-time-way-up.jpg, > what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-38792: --- Attachment: dummy-job-job.jpg > Regression in time executor takes to do work since v3.0.1 ? > --- > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: dummy-job-job.jpg, min-time-way-up.jpg, > what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-38792: --- Attachment: min-time-way-up.jpg > Regression in time executor takes to do work since v3.0.1 ? > --- > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: min-time-way-up.jpg, what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-38792: --- Attachment: (was: min-time-way-up.jpg) > Regression in time executor takes to do work since v3.0.1 ? > --- > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518273#comment-17518273 ] Danny Guinther commented on SPARK-38792: To further try to understand what is going on here, I created a very minimal dummy data flow that aims to eliminate more variables as to what could be wrong with my new Spark 3.2.1 deployment. Instead of reading from a DB, the dummy dataframe is codified like so: {code:java} val Things = Seq[Row]( Row("----", "Thing 0"), Row("----0001", "Thing 1"), Row("----0002", "Thing 2"), Row("----0003", "Thing 3"), Row("----0004", "Thing 4"), Row("----0005", "Thing 5"), Row("----0006", "Thing 6"), Row("----0007", "Thing 7"), Row("----0008", "Thing 8"), Row("----0009", "Thing 9"), Row("----000a", "Thing a"), Row("----000b", "Thing b"), Row("----000c", "Thing c"), Row("----000d", "Thing d"), Row("----000e", "Thing e"), Row("----000f", "Thing f") ) private var sourceDataFrame: Option[DataFrame] = None def mkSourceDataFrame(sparkSession: SparkSession): DataFrame = { sourceDataFrame match { case Some(srcDf) => srcDf case None => val srcDf = sparkSession.createDataFrame( sparkSession.sparkContext.makeRDD(Things, 1), RecordSchema ) sourceDataFrame = Some(srcDf) srcDf } }{code} >From there, I do a single simple transformation and then perform a count >action roughly like so: {code:java} mkSourceDataFrame(sparkSession) .withColumn("random", lit(UUID.randomUUID.toString)) .count(){code} Even in this very simple case, I am seeing a drastic increase in the time taken to complete the job when upgrading from Spark 3.0.1 to Spark 3.2.1. The difference in the minimum time required is especially noteworthy. Please see the attached screenshot named min-time-way-up.jpg for a visual of the difference. Help? > Regression in time executor takes to do work since v3.0.1 ? > --- > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: min-time-way-up.jpg, what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark
[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-38792: --- Attachment: min-time-way-up.jpg > Regression in time executor takes to do work since v3.0.1 ? > --- > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: min-time-way-up.jpg, what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. The > specific actions I'm invoking are: count() (but there's some transforming and > caching going on, so it's really more than that); first(); and write(). > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-38792: --- Description: Hello! I'm sorry to trouble you with this, but I'm seeing a noticeable regression in performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I don't believe it is specific to my application since the upgrade to 3.0.1 to 3.2.1 is purely a configuration change. I'd guess it presents itself in my application due to the high volume of work my application does, but I could be mistaken. The gist is that it seems like the executor actions I'm running suddenly appear to take a lot longer on Spark 3.2.1. I don't have any ability to test versions between 3.0.1 and 3.2.1 because my application was previously blocked from upgrading beyond Spark 3.0.1 by https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). Any ideas what might cause this or metrics I might try to gather to pinpoint the problem? I've tried a bunch of the suggestions from [https://spark.apache.org/docs/latest/tuning.html] to see if any of those help, but none of the adjustments I've tried have been fruitful. I also tried to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] for ideas as to what might have changed to cause this behavior, but haven't seen anything that sticks out as being a possible source of the problem. I have attached a graph that shows the drastic change in time taken by executor actions. In the image the blue and purple lines are different kinds of reads using the built-in JDBC data reader and the green line is writes using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 occurred at 9AM on the graph. The graph data comes from timing blocks that surround only the calls to dataframe actions, so there shouldn't be anything specific to my application that is suddenly inflating these numbers. The specific actions I'm invoking are: count() (but there's some transforming and caching going on, so it's really more than that); first(); and write(). The driver process does seem to be seeing more GC churn then with Spark 3.0.1, but I don't think that explains this behavior. The executors don't seem to have any problem with memory or GC and are not overutilized (our pipeline is very read and write heavy, less heavy on transformations, so executors tend to be idle while waiting for various network I/O). Thanks in advance for any help! was: Hello! I'm sorry to trouble you with this, but I'm seeing a noticeable regression in performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I don't believe it is specific to my application since the upgrade to 3.0.1 to 3.2.1 is purely a configuration change. I'd guess it presents itself in my application due to the high volume of work my application does, but I could be mistaken. The gist is that it seems like the executor actions I'm running suddenly appear to take a lot longer on Spark 3.2.1. I don't have any ability to test versions between 3.0.1 and 3.2.1 because my application was previously blocked from upgrading beyond Spark 3.0.1 by https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). Any ideas what might cause this or metrics I might try to gather to pinpoint the problem? I've tried a bunch of the suggestions from [https://spark.apache.org/docs/latest/tuning.html] to see if any of those help, but none of the adjustments I've tried have been fruitful. I also tried to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] for ideas as to what might have changed to cause this behavior, but haven't seen anything that sticks out as being a possible source of the problem. I have attached a graph that shows the drastic change in time taken by executor actions. In the image the blue and purple lines are different kinds of reads using the built-in JDBC data reader and the green line is writes using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 occurred at 9AM on the graph. The graph data comes from timing blocks that surround only the calls to dataframe actions, so there shouldn't be anything specific to my application that is suddenly inflating these numbers. The driver process does seem to be seeing more GC churn then with Spark 3.0.1, but I don't think that explains this behavior. The executors don't seem to have any problem with memory or GC and are not overutilized (our pipeline is very read and write heavy, less heavy on transformations, so executors tend to be idle while waiting for various network I/O). Thanks in advance for any help! > Regression in time executor takes to do work since v3.0.1 ? > --- > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK
[jira] [Comment Edited] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517704#comment-17517704 ] Danny Guinther edited comment on SPARK-38792 at 4/5/22 9:09 PM: I don't know if it is helpful, but the runtime environment that the application is running in is a hosted Databricks workspace running in Azure. I have tried deploying the upgrade to 3.2.1 several times in the last month and it behaves this way every time, so this is not just a fluke of bad timing. I also tried adding 8 more executors w/ 4 cores each to try to help speed things up and this had no obvious impact on throughput. was (Author: danny-seismic): I don't know if it is helpful, but the runtime environment that the application is running in is a hosted Databricks workspace running in Azure. I have tried deploying the upgrade to 3.2.1 several times in the last month and it behaves this way every time, so this is not just a fluke of bad timing. > Regression in time executor takes to do work since v3.0.1 ? > --- > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517704#comment-17517704 ] Danny Guinther commented on SPARK-38792: I don't know if it is helpful, but the runtime environment that the application is running in is a hosted Databricks workspace running in Azure. I have tried deploying the upgrade to 3.2.1 several times in the last month and it behaves this way every time, so this is not just a fluke of bad timing. > Regression in time executor takes to do work since v3.0.1 ? > --- > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. The graph data comes from timing blocks that > surround only the calls to dataframe actions, so there shouldn't be anything > specific to my application that is suddenly inflating these numbers. > The driver process does seem to be seeing more GC churn then with Spark > 3.0.1, but I don't think that explains this behavior. The executors don't > seem to have any problem with memory or GC and are not overutilized (our > pipeline is very read and write heavy, less heavy on transformations, so > executors tend to be idle while waiting for various network I/O). > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-38792: --- Description: Hello! I'm sorry to trouble you with this, but I'm seeing a noticeable regression in performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I don't believe it is specific to my application since the upgrade to 3.0.1 to 3.2.1 is purely a configuration change. I'd guess it presents itself in my application due to the high volume of work my application does, but I could be mistaken. The gist is that it seems like the executor actions I'm running suddenly appear to take a lot longer on Spark 3.2.1. I don't have any ability to test versions between 3.0.1 and 3.2.1 because my application was previously blocked from upgrading beyond Spark 3.0.1 by https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). Any ideas what might cause this or metrics I might try to gather to pinpoint the problem? I've tried a bunch of the suggestions from [https://spark.apache.org/docs/latest/tuning.html] to see if any of those help, but none of the adjustments I've tried have been fruitful. I also tried to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] for ideas as to what might have changed to cause this behavior, but haven't seen anything that sticks out as being a possible source of the problem. I have attached a graph that shows the drastic change in time taken by executor actions. In the image the blue and purple lines are different kinds of reads using the built-in JDBC data reader and the green line is writes using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 occurred at 9AM on the graph. The graph data comes from timing blocks that surround only the calls to dataframe actions, so there shouldn't be anything specific to my application that is suddenly inflating these numbers. The driver process does seem to be seeing more GC churn then with Spark 3.0.1, but I don't think that explains this behavior. The executors don't seem to have any problem with memory or GC and are not overutilized (our pipeline is very read and write heavy, less heavy on transformations, so executors tend to be idle while waiting for various network I/O). Thanks in advance for any help! was: Hello! I'm sorry to trouble you with this, but I'm seeing a noticeable regression in performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I don't believe it is specific to my application since the upgrade to 3.0.1 to 3.2.1 is purely a configuration change. I'd guess it presents itself in my application due to the high volume of work my application does, but I could be mistaken. The gist is that it seems like the executor actions I'm running suddenly appear to take a lot longer on Spark 3.2.1. I don't have any ability to test versions between 3.0.1 and 3.2.1 because my application was previously blocked from upgrading beyond Spark 3.0.1 by https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). Any ideas what might cause this or metrics I might try to gather to pinpoint the problem? I've tried a bunch of the suggestions from [https://spark.apache.org/docs/latest/tuning.html] to see if any of those help, but none of the adjustments I've tried have been fruitful. I also tried to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] for ideas as to what might have changed to cause this behavior, but haven't seen anything that sticks out as being a possible source of the problem. I have attached a graph that shows the drastic change in time taken by executor actions. In the image the blue and purple lines are different kinds of reads using the built-in JDBC data reader and the green line is writes using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 occurred at 9AM on the graph. Thanks in advance for any help! > Regression in time executor takes to do work since v3.0.1 ? > --- > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the exe
[jira] [Updated] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?
[ https://issues.apache.org/jira/browse/SPARK-38792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-38792: --- Attachment: what-s-up-with-exec-actions.jpg > Regression in time executor takes to do work since v3.0.1 ? > --- > > Key: SPARK-38792 > URL: https://issues.apache.org/jira/browse/SPARK-38792 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Danny Guinther >Priority: Major > Attachments: what-s-up-with-exec-actions.jpg > > > Hello! > I'm sorry to trouble you with this, but I'm seeing a noticeable regression in > performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I > don't believe it is specific to my application since the upgrade to 3.0.1 to > 3.2.1 is purely a configuration change. I'd guess it presents itself in my > application due to the high volume of work my application does, but I could > be mistaken. > The gist is that it seems like the executor actions I'm running suddenly > appear to take a lot longer on Spark 3.2.1. I don't have any ability to test > versions between 3.0.1 and 3.2.1 because my application was previously > blocked from upgrading beyond Spark 3.0.1 by > https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). > Any ideas what might cause this or metrics I might try to gather to pinpoint > the problem? I've tried a bunch of the suggestions from > [https://spark.apache.org/docs/latest/tuning.html] to see if any of those > help, but none of the adjustments I've tried have been fruitful. I also tried > to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] > for ideas as to what might have changed to cause this behavior, but haven't > seen anything that sticks out as being a possible source of the problem. > I have attached a graph that shows the drastic change in time taken by > executor actions. In the image the blue and purple lines are different kinds > of reads using the built-in JDBC data reader and the green line is writes > using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 > occurred at 9AM on the graph. > > Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38792) Regression in time executor takes to do work since v3.0.1 ?
Danny Guinther created SPARK-38792: -- Summary: Regression in time executor takes to do work since v3.0.1 ? Key: SPARK-38792 URL: https://issues.apache.org/jira/browse/SPARK-38792 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.1 Reporter: Danny Guinther Attachments: what-s-up-with-exec-actions.jpg Hello! I'm sorry to trouble you with this, but I'm seeing a noticeable regression in performance when upgrading from 3.0.1 to 3.2.1 and I can't pin down why. I don't believe it is specific to my application since the upgrade to 3.0.1 to 3.2.1 is purely a configuration change. I'd guess it presents itself in my application due to the high volume of work my application does, but I could be mistaken. The gist is that it seems like the executor actions I'm running suddenly appear to take a lot longer on Spark 3.2.1. I don't have any ability to test versions between 3.0.1 and 3.2.1 because my application was previously blocked from upgrading beyond Spark 3.0.1 by https://issues.apache.org/jira/browse/SPARK-37391 (which I helped to fix). Any ideas what might cause this or metrics I might try to gather to pinpoint the problem? I've tried a bunch of the suggestions from [https://spark.apache.org/docs/latest/tuning.html] to see if any of those help, but none of the adjustments I've tried have been fruitful. I also tried to look in [https://spark.apache.org/docs/latest/sql-migration-guide.html] for ideas as to what might have changed to cause this behavior, but haven't seen anything that sticks out as being a possible source of the problem. I have attached a graph that shows the drastic change in time taken by executor actions. In the image the blue and purple lines are different kinds of reads using the built-in JDBC data reader and the green line is writes using a custom-built data writer. The deploy to switch from 3.0.1 to 3.2.1 occurred at 9AM on the graph. Thanks in advance for any help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464042#comment-17464042 ] Danny Guinther commented on SPARK-37391: I've created three PRs to facilitate backporting this change to 3.1.3: master PR: [https://github.com/apache/spark/pull/34745] branch-3.1 PR: [https://github.com/apache/spark/pull/34988] branch-3.2 PR: [https://github.com/apache/spark/pull/34989] > SIGNIFICANT bottleneck introduced by fix for SPARK-32001 > > > Key: SPARK-37391 > URL: https://issues.apache.org/jira/browse/SPARK-37391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Environment: N/A >Reporter: Danny Guinther >Priority: Major > Attachments: so-much-blocking.jpg, spark-regression-dashes.jpg > > > The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( > [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] > ) does not seem to have consider the reality that some apps may rely on > being able to establish many JDBC connections simultaneously for performance > reasons. > The fix forces concurrency to 1 when establishing database connections and > that strikes me as a *significant* user impacting change and a *significant* > bottleneck. > Can anyone propose a workaround for this? I have an app that makes > connections to thousands of databases and I can't upgrade to any version > >3.1.x because of this significant bottleneck. > > Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37715) Docker integration tests: Tweak docs and remove unneeded dependency
[ https://issues.apache.org/jira/browse/SPARK-37715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463879#comment-17463879 ] Danny Guinther commented on SPARK-37715: PR here: https://github.com/apache/spark/pull/34979 > Docker integration tests: Tweak docs and remove unneeded dependency > --- > > Key: SPARK-37715 > URL: https://issues.apache.org/jira/browse/SPARK-37715 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 >Reporter: Danny Guinther >Priority: Trivial > > These are a couple of changes I found worthwhile while running docker > integration tests for [#34745|https://github.com/apache/spark/pull/34745]. > The doc changes are minor fixes to add the missing repository to the > suggested command to run the docker integration tests. > The library change relates to this comment: [#34745 > (comment)|https://github.com/apache/spark/pull/34745#discussion_r773084417]; > I don't know if my testing was thorough enough, but I found that the > referenced dependency was not needed and could be removed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37715) Docker integration tests: Tweak docs and remove unneeded dependency
Danny Guinther created SPARK-37715: -- Summary: Docker integration tests: Tweak docs and remove unneeded dependency Key: SPARK-37715 URL: https://issues.apache.org/jira/browse/SPARK-37715 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.3.0 Reporter: Danny Guinther These are a couple of changes I found worthwhile while running docker integration tests for [#34745|https://github.com/apache/spark/pull/34745]. The doc changes are minor fixes to add the missing repository to the suggested command to run the docker integration tests. The library change relates to this comment: [#34745 (comment)|https://github.com/apache/spark/pull/34745#discussion_r773084417]; I don't know if my testing was thorough enough, but I found that the referenced dependency was not needed and could be removed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447529#comment-17447529 ] Danny Guinther commented on SPARK-37391: Here's an example stacktrace for one of the blocked threads: {{org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProviderBase.create(ConnectionProvider.scala:92)}} {{org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:63)}} {{org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$$Lambda$6294/1994845663.apply(Unknown Source)}} {{org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)}} {{org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)}} {{org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)}} {{org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:390)}} {{org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:444)}} {{org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:400)}} {{org.apache.spark.sql.DataFrameReader$$Lambda$6224/1118373872.apply(Unknown Source)}} {{scala.Option.getOrElse(Option.scala:189)}} {{org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:400)}} {{org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:273)}} {{}} {{scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)}} {{scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)}} {{scala.concurrent.Future$$$Lambda$442/341778327.apply(Unknown Source)}} {{scala.util.Success.$anonfun$map$1(Try.scala:255)}} {{scala.util.Success.map(Try.scala:213)}} {{scala.concurrent.Future.$anonfun$map$1(Future.scala:292)}} {{scala.concurrent.Future$$Lambda$443/424848797.apply(Unknown Source)}} {{scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)}} {{scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)}} {{scala.concurrent.impl.Promise$$Lambda$444/1710905079.apply(Unknown Source)}} {{scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)}} {{java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}} {{java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}} {{java.lang.Thread.run(Thread.java:748)}} The stacktrace from the thread that is holding the lock looks like so: {{java.net.SocketInputStream.socketRead0(Native Method)}} {{java.net.SocketInputStream.socketRead(SocketInputStream.java:116)}} {{java.net.SocketInputStream.read(SocketInputStream.java:171)}} {{java.net.SocketInputStream.read(SocketInputStream.java:141)}} {{com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.readInternal(IOBuffer.java:1019)}} {{com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.read(IOBuffer.java:1009)}} {{sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:476)}} {{sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:470)}} {{sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70)}} {{sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1364)}} {{sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)}} {{sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:973)}} {{com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2058)}} {{com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6617) => holding Monitor(com.microsoft.sqlserver.jdbc.TDSReader@1035497922})}} {{com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7805)}} {{com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7768)}} {{com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:5332)}} {{com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:4066)}} {{com.microsoft.sqlserver.jdbc.SQLServerConnection.access$000(SQLServerConnection.java:85)}} {{com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:4004)}} {{com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7418)}} {{com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3272) => holding Monitor(java.lang.Object@564746804})}} {{com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2768)}} {{com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2418)}} {{com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:2265)}} {{com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1291)}} {{com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:881)}} {{org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(
[jira] [Commented] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447512#comment-17447512 ] Danny Guinther commented on SPARK-37391: [~hyukjin.kwon] , sorry, I seem to have gotten confused when identifying the source of the regression. I have updated the title and description to reflect the true source of the issue. I'm inclined to blame this change: https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58 I'm sorry, but I don't have the capacity to provide a self-contained reproduction of the issue. Hopefully the problem is obvious enough that you will be able to see what is going on from the anecdotal evidence I can provide. The introduction of SecurityConfigurationLock.synchronized prevents a given JDBC Driver from establishing more than one connection at a time (or at least severely limits the concurrency). This is a significant bottleneck for applications that use a single JDBC driver to establish many database connections. The anecdotal evidence I can offer to support this claim: 1. I've attached a screenshot of some dashboards we use to monitor the QA deployment of the application in question. These graphs come from a 4.5 hour window where I had spark 3.1.2 deployed to QA. On the left side of the graph we were running Spark 2.4.5; in the middle we were running spark 3.1.2; and on the right side of the graph we are running spark 3.0.1. # The "Success Rate", "CountActiveTasks", "CountActiveJobs", "CountTableTenantJobStart", "CountTableTenantJobEnd" graphs all aim to demonstrate that with the deployment of spark 3.1.2 the throughput of the application was significantly reduced across the board. # The "Overall Active Thread Count", "Count Active Executors", and "CountDeadExecutors" graphs all aim to evidence that there was no change in the number of resources allocated to do work. # The "Max MinsSinceLastAttempt" graph should normally be a flat line unless the application is falling behind on the work that it is scheduled to do. It can be seen during the period of the spark 3.1.2 deployment the application is falling behind at a linear rate and begins to recover once spark 3.0.1 is deployed. !spark-regression-dashes.jpg! 2. I've attached a screenshot of the thread dump from the spark driver process. It can be seen that many, many threads are blocked waiting for SecurityConfigurationLock. The screenshot only shows a handful of threads but there are 98 threads in total blocked wiating for the SecurityConfigurationLock. !so-much-blocking.jpg! It's worth noting that our QA deployment does significantly less work than our production deployment; if the QA deployment can't keep up then the production deployment has no chance. On the bright side, I had success updating the production deployment to spark 3.0.1 and that seems to be stable. Unfortunately, we use Databricks for our spark vendor and the LTS release they have that supports spark 3.0.1 is only scheduled to be maintained until September 2022, so we can't avoid this regression forever. If I can answer any questions or provide any more info, please let me know. Thanks in advance! > SIGNIFICANT bottleneck introduced by fix for SPARK-32001 > > > Key: SPARK-37391 > URL: https://issues.apache.org/jira/browse/SPARK-37391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Environment: N/A >Reporter: Danny Guinther >Priority: Major > Attachments: so-much-blocking.jpg, spark-regression-dashes.jpg > > > The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( > [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] > ) does not seem to have consider the reality that some apps may rely on > being able to establish many JDBC connections simultaneously for performance > reasons. > The fix forces concurrency to 1 when establishing database connections and > that strikes me as a *significant* user impacting change and a *significant* > bottleneck. > Can anyone propose a workaround for this? I have an app that makes > connections to thousands of databases and I can't upgrade to any version > >3.1.x because of this significant bottleneck. > > Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-37391: --- Attachment: so-much-blocking.jpg > SIGNIFICANT bottleneck introduced by fix for SPARK-32001 > > > Key: SPARK-37391 > URL: https://issues.apache.org/jira/browse/SPARK-37391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Environment: N/A >Reporter: Danny Guinther >Priority: Major > Attachments: so-much-blocking.jpg, spark-regression-dashes.jpg > > > The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( > [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] > ) does not seem to have consider the reality that some apps may rely on > being able to establish many JDBC connections simultaneously for performance > reasons. > The fix forces concurrency to 1 when establishing database connections and > that strikes me as a *significant* user impacting change and a *significant* > bottleneck. > Can anyone propose a workaround for this? I have an app that makes > connections to thousands of databases and I can't upgrade to any version > >3.1.x because of this significant bottleneck. > > Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-37391: --- Attachment: spark-regression-dashes.jpg > SIGNIFICANT bottleneck introduced by fix for SPARK-32001 > > > Key: SPARK-37391 > URL: https://issues.apache.org/jira/browse/SPARK-37391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Environment: N/A >Reporter: Danny Guinther >Priority: Major > Attachments: spark-regression-dashes.jpg > > > The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( > [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] > ) does not seem to have consider the reality that some apps may rely on > being able to establish many JDBC connections simultaneously for performance > reasons. > The fix forces concurrency to 1 when establishing database connections and > that strikes me as a *significant* user impacting change and a *significant* > bottleneck. > Can anyone propose a workaround for this? I have an app that makes > connections to thousands of databases and I can't upgrade to any version > >3.1.x because of this significant bottleneck. > > Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-32001
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-37391: --- Summary: SIGNIFICANT bottleneck introduced by fix for SPARK-32001 (was: SIGNIFICANT bottleneck introduced by fix for SPARK-34497) > SIGNIFICANT bottleneck introduced by fix for SPARK-32001 > > > Key: SPARK-37391 > URL: https://issues.apache.org/jira/browse/SPARK-37391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Environment: N/A >Reporter: Danny Guinther >Priority: Major > > The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( > [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] > ) does not seem to have consider the reality that some apps may rely on > being able to establish many JDBC connections simultaneously for performance > reasons. > The fix forces concurrency to 1 when establishing database connections and > that strikes me as a *significant* user impacting change and a *significant* > bottleneck. > Can anyone propose a workaround for this? I have an app that makes > connections to thousands of databases and I can't upgrade to any version > >3.1.x because of this significant bottleneck. > > Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-34497
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-37391: --- Description: The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] ) does not seem to have consider the reality that some apps may rely on being able to establish many JDBC connections simultaneously for performance reasons. The fix forces concurrency to 1 when establishing database connections and that strikes me as a *significant* user impacting change and a *significant* bottleneck. Can anyone propose a workaround for this? I have an app that makes connections to thousands of databases and I can't upgrade to any version >3.1.x because of this significant bottleneck. Thanks in advance for your help! was: The fix for SPARK-34497 ( [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58|https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] ) does not seem to have consider the reality that some apps may rely on being able to establish many JDBC connections simultaneously for performance reasons. The fix forces concurrency to 1 when establishing database connections and that strikes me as a *significant* user impacting change and a *significant* bottleneck. Can anyone propose a workaround for this? I have an app that makes connections to thousands of databases and I can't upgrade to any version >3.1.x because of this significant bottleneck. Thanks in advance for your help! > SIGNIFICANT bottleneck introduced by fix for SPARK-34497 > > > Key: SPARK-37391 > URL: https://issues.apache.org/jira/browse/SPARK-37391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Environment: N/A >Reporter: Danny Guinther >Priority: Major > > The fix for https://issues.apache.org/jira/browse/SPARK-32001 ( > [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] > ) does not seem to have consider the reality that some apps may rely on > being able to establish many JDBC connections simultaneously for performance > reasons. > The fix forces concurrency to 1 when establishing database connections and > that strikes me as a *significant* user impacting change and a *significant* > bottleneck. > Can anyone propose a workaround for this? I have an app that makes > connections to thousands of databases and I can't upgrade to any version > >3.1.x because of this significant bottleneck. > > Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-34497
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-37391: --- Description: The fix for SPARK-34497 ( [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58|https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] ) does not seem to have consider the reality that some apps may rely on being able to establish many JDBC connections simultaneously for performance reasons. The fix forces concurrency to 1 when establishing database connections and that strikes me as a *significant* user impacting change and a *significant* bottleneck. Can anyone propose a workaround for this? I have an app that makes connections to thousands of databases and I can't upgrade to any version >3.1.x because of this significant bottleneck. Thanks in advance for your help! was: The fix for SPARK-34497 ( [https://github.com/apache/spark/pull/31622] ) does not seem to have consider the reality that some apps may rely on being able to establish many JDBC connections simultaneously for performance reasons. The fix forces concurrency to 1 when establishing database connections and that strikes me as a *significant* user impacting change and a *significant* bottleneck. Can anyone propose a workaround for this? I have an app that makes connections to thousands of databases and I can't upgrade to any version >3.1.x because of this significant bottleneck. Thanks in advance for your help! > SIGNIFICANT bottleneck introduced by fix for SPARK-34497 > > > Key: SPARK-37391 > URL: https://issues.apache.org/jira/browse/SPARK-37391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Environment: N/A >Reporter: Danny Guinther >Priority: Major > > The fix for SPARK-34497 ( > [https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58|https://github.com/apache/spark/pull/29024/files#diff-345beef18081272d77d91eeca2d9b5534ff6e642245352f40f4e9c9b8922b085R58] > ) does not seem to have consider the reality that some apps may rely on > being able to establish many JDBC connections simultaneously for performance > reasons. > The fix forces concurrency to 1 when establishing database connections and > that strikes me as a *significant* user impacting change and a *significant* > bottleneck. > Can anyone propose a workaround for this? I have an app that makes > connections to thousands of databases and I can't upgrade to any version > >3.1.x because of this significant bottleneck. > > Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-34497
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-37391: --- Description: The fix for SPARK-34497 ( [https://github.com/apache/spark/pull/31622] ) does not seem to have consider the reality that some apps may rely on being able to establish many JDBC connections simultaneously for performance reasons. The fix forces concurrency to 1 when establishing database connections and that strikes me as a *significant* user impacting change and a *significant* bottleneck. Can anyone propose a workaround for this? I have an app that makes connections to thousands of databases and I can't upgrade to any version >3.1.x because of this significant bottleneck. Thanks in advance for your help! was: The fix for SPARK-34497 ([https://github.com/apache/spark/pull/31622)] does not seem to have consider the reality that some apps may rely on being able to establish many JDBC connections simultaneously for performance reasons. The fix forces concurrency to 1 when establishing database connections and that strikes me as a *significant* user impacting change and a *significant* bottleneck. Can anyone propose a workaround for this? I have an app that makes connections to thousands of databases and I can't upgrade to any version >3.1.x because of this significant bottleneck. Thanks in advance for your help! > SIGNIFICANT bottleneck introduced by fix for SPARK-34497 > > > Key: SPARK-37391 > URL: https://issues.apache.org/jira/browse/SPARK-37391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Environment: N/A >Reporter: Danny Guinther >Priority: Major > > The fix for SPARK-34497 ( [https://github.com/apache/spark/pull/31622] ) does > not seem to have consider the reality that some apps may rely on being able > to establish many JDBC connections simultaneously for performance reasons. > The fix forces concurrency to 1 when establishing database connections and > that strikes me as a *significant* user impacting change and a *significant* > bottleneck. > Can anyone propose a workaround for this? I have an app that makes > connections to thousands of databases and I can't upgrade to any version > >3.1.x because of this significant bottleneck. > > Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-34497
[ https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Guinther updated SPARK-37391: --- Description: The fix for SPARK-34497 ([https://github.com/apache/spark/pull/31622)] does not seem to have consider the reality that some apps may rely on being able to establish many JDBC connections simultaneously for performance reasons. The fix forces concurrency to 1 when establishing database connections and that strikes me as a *significant* user impacting change and a *significant* bottleneck. Can anyone propose a workaround for this? I have an app that makes connections to thousands of databases and I can't upgrade to any version >3.1.x because of this significant bottleneck. Thanks in advance for your help! was: The fix for SPARK-34497 does not seem to have consider the reality that some apps may rely on being able to establish many JDBC connections simultaneously for performance reasons. The fix forces concurrency to 1 when establishing database connections and that strikes me as a *significant* user impacting change and a *significant* bottleneck. Can anyone propose a workaround for this? I have an app that makes connections to thousands of databases and I can't upgrade to any version >3.1.x because of this significant bottleneck. Thanks in advance for your help! > SIGNIFICANT bottleneck introduced by fix for SPARK-34497 > > > Key: SPARK-37391 > URL: https://issues.apache.org/jira/browse/SPARK-37391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Environment: N/A >Reporter: Danny Guinther >Priority: Major > > The fix for SPARK-34497 ([https://github.com/apache/spark/pull/31622)] does > not seem to have consider the reality that some apps may rely on being able > to establish many JDBC connections simultaneously for performance reasons. > The fix forces concurrency to 1 when establishing database connections and > that strikes me as a *significant* user impacting change and a *significant* > bottleneck. > Can anyone propose a workaround for this? I have an app that makes > connections to thousands of databases and I can't upgrade to any version > >3.1.x because of this significant bottleneck. > > Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-34497
Danny Guinther created SPARK-37391: -- Summary: SIGNIFICANT bottleneck introduced by fix for SPARK-34497 Key: SPARK-37391 URL: https://issues.apache.org/jira/browse/SPARK-37391 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0, 3.1.2, 3.1.1, 3.1.0 Environment: N/A Reporter: Danny Guinther The fix for SPARK-34497 does not seem to have consider the reality that some apps may rely on being able to establish many JDBC connections simultaneously for performance reasons. The fix forces concurrency to 1 when establishing database connections and that strikes me as a *significant* user impacting change and a *significant* bottleneck. Can anyone propose a workaround for this? I have an app that makes connections to thousands of databases and I can't upgrade to any version >3.1.x because of this significant bottleneck. Thanks in advance for your help! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19335) Spark should support doing an efficient DataFrame Upsert via JDBC
[ https://issues.apache.org/jira/browse/SPARK-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826378#comment-16826378 ] Danny Guinther commented on SPARK-19335: Any update on this? Also, please forgive this dumb question, but I'm shocked that there's not more demand for this feature which makes me wonder if I have major misconceptions about Spark and its intended use. How do users survive without this functionality? I take it that the destination SQL database should have flexible up-time requirements allowing for drastic changes? The overwrite save mode is the only thing that offers anything like an UPDATE, but totally dropping/truncating the destination table seems inconceivable for many production environments. What am I missing? > Spark should support doing an efficient DataFrame Upsert via JDBC > - > > Key: SPARK-19335 > URL: https://issues.apache.org/jira/browse/SPARK-19335 > Project: Spark > Issue Type: Improvement >Reporter: Ilya Ganelin >Priority: Minor > > Doing a database update, as opposed to an insert is useful, particularly when > working with streaming applications which may require revisions to previously > stored data. > Spark DataFrames/DataSets do not currently support an Update feature via the > JDBC Writer allowing only Overwrite or Append. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org