[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r251294071 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -126,4 +126,13 @@ object StaticSQLConf { .intConf .createWithDefault(1000) + val MAX_BROADCAST_EXCHANGE_THREADNUMBER = +buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber") + .doc("The maximum degree of parallelism to fetch and broadcast the table.If we " + Review comment: If that's around this code and that's the only the one, yea, let's do that. If there are multiple across this files, let's don't include. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r249318022 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -126,4 +126,13 @@ object StaticSQLConf { .intConf .createWithDefault(1000) + val MAX_BROADCAST_EXCHANGE_THREADNUMBER = +buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber") + .doc("The maximum degree of parallelism to fetch and broadcast the table.If we " + +"encounter memory issue when broadcast table we can decrease this number." + Review comment: `this number in order to ...` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r249317993 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -126,4 +126,13 @@ object StaticSQLConf { .intConf .createWithDefault(1000) + val MAX_BROADCAST_EXCHANGE_THREADNUMBER = +buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber") + .doc("The maximum degree of parallelism to fetch and broadcast the table.If we " + +"encounter memory issue when broadcast table we can decrease this number." + Review comment: `memory issue`: can you elaborate which memory issue here? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r249317928 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -126,4 +126,13 @@ object StaticSQLConf { .intConf .createWithDefault(1000) + val MAX_BROADCAST_EXCHANGE_THREADNUMBER = +buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber") + .doc("The maximum degree of parallelism to fetch and broadcast the table.If we " + +"encounter memory issue when broadcast table we can decrease this number." + +"Notice the number should be carefully chosen since decrease parallelism will " + +"cause longer waiting for other broadcasting.And increase parallelism may " + Review comment: `broadcasting.And` -> `broadcasting. Also, increasing` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r249317928 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -126,4 +126,13 @@ object StaticSQLConf { .intConf .createWithDefault(1000) + val MAX_BROADCAST_EXCHANGE_THREADNUMBER = +buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber") + .doc("The maximum degree of parallelism to fetch and broadcast the table.If we " + +"encounter memory issue when broadcast table we can decrease this number." + +"Notice the number should be carefully chosen since decrease parallelism will " + +"cause longer waiting for other broadcasting.And increase parallelism may " + Review comment: `broadcasting.And` -> `broadcasting. Also, incresing` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r249317902 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -126,4 +126,13 @@ object StaticSQLConf { .intConf .createWithDefault(1000) + val MAX_BROADCAST_EXCHANGE_THREADNUMBER = +buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber") + .doc("The maximum degree of parallelism to fetch and broadcast the table.If we " + +"encounter memory issue when broadcast table we can decrease this number." + +"Notice the number should be carefully chosen since decrease parallelism will " + Review comment: `will` -> `might` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r249317880 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -126,4 +126,13 @@ object StaticSQLConf { .intConf .createWithDefault(1000) + val MAX_BROADCAST_EXCHANGE_THREADNUMBER = +buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber") + .doc("The maximum degree of parallelism to fetch and broadcast the table.If we " + +"encounter memory issue when broadcast table we can decrease this number." + +"Notice the number should be carefully chosen since decrease parallelism will " + Review comment: `decrease ` -> `decreasing` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r249317812 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -126,4 +126,13 @@ object StaticSQLConf { .intConf .createWithDefault(1000) + val MAX_BROADCAST_EXCHANGE_THREADNUMBER = +buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber") + .doc("The maximum degree of parallelism to fetch and broadcast the table.If we " + Review comment: extra sapce -> `table. If` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r249317752 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala ## @@ -132,4 +135,33 @@ class ExchangeSuite extends SparkPlanTest with SharedSQLContext { val projection2 = cached.select("_1", "_3").queryExecution.executedPlan assert(!projection1.sameResult(projection2)) } + + test("SPARK-26601: Make broadcast-exchange thread pool configurable") { +val previousNumber = SparkSession.getActiveSession.get.sparkContext.conf + .get(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER) + +SparkSession.getActiveSession.get.sparkContext.conf. + set(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER, 1) + assert(SQLConf.get.getConf(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER) === 1) + +Future { + Thread.sleep(5*1000) +} (BroadcastExchangeExec.executionContext) + +val f = Future {} (BroadcastExchangeExec.executionContext) Review comment: You don't have to test Java's thread executors. Can you just check if `BroadcastExchangeExec.executionContext .getMaximumPoolSize` is as configured? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r248621438 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala ## @@ -132,4 +136,29 @@ class ExchangeSuite extends SparkPlanTest with SharedSQLContext { val projection2 = cached.select("_1", "_3").queryExecution.executedPlan assert(!projection1.sameResult(projection2)) } + + test("SPARK-26601: Make broadcast-exchange thread pool configurable") { +val sparkConf = new SparkConf() + .set(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER.key, "1") + .set("spark.driver.allowMultipleContexts", "true") +val tss = new TestSparkSession(sparkConf) +SparkSession.setActiveSession(tss) + assert(SQLConf.get.getConf(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER) === 1) + +Future { + Thread.sleep(5*1000) +}(BroadcastExchangeExec.executionContext) Review comment: For style, please take a look for https://github.com/databricks/scala-style-guide This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r248621249 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala ## @@ -132,4 +136,29 @@ class ExchangeSuite extends SparkPlanTest with SharedSQLContext { val projection2 = cached.select("_1", "_3").queryExecution.executedPlan assert(!projection1.sameResult(projection2)) } + + test("SPARK-26601: Make broadcast-exchange thread pool configurable") { +val sparkConf = new SparkConf() + .set(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER.key, "1") + .set("spark.driver.allowMultipleContexts", "true") Review comment: `allowMultipleContexts` this is removed out as of https://github.com/apache/spark/pull/23311 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r247748689 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala ## @@ -132,4 +132,12 @@ class ExchangeSuite extends SparkPlanTest with SharedSQLContext { val projection2 = cached.select("_1", "_3").queryExecution.executedPlan assert(!projection1.sameResult(projection2)) } + + test("SPARK-26601: Make broadcast-exchange thread pool configurable") { + +withSQLConf(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER.key -> "1") { + assert(SQLConf.get.getConf(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER) === 1) + assert(BroadcastExchangeExec.executionContext != null) Review comment: `BroadcastExchangeExec.executionContext` can't be null. Looks it doesn;t properly test the change. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r247748426 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -126,4 +126,10 @@ object StaticSQLConf { .intConf .createWithDefault(1000) + val MAX_BROADCAST_EXCHANGE_THREADNUMBER = +buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber") + .doc("MAX number of threads can hold by BroadcastExchangeExec which controls" + Review comment: Still, users won't know what's `BroadcastExchangeExec` and `MAX number of threads`. Please do your diligence for this doc. Also, we might have to describe when to increase and when to decrease. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r247333148 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -126,4 +126,10 @@ object StaticSQLConf { .intConf .createWithDefault(1000) + val MAX_BROADCAST_EXCHANGE_THREADNUMBER = +buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber") + .doc("MAX number of threads can hold by BroadcastExchangeExec.") Review comment: Can you elaborate this in the doc about what this number controls? For instance, it controls the parallelism of fetching and broadcasting the table. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable URL: https://github.com/apache/spark/pull/23519#discussion_r247333106 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala ## @@ -157,5 +157,6 @@ case class BroadcastExchangeExec( object BroadcastExchangeExec { private[execution] val executionContext = ExecutionContext.fromExecutorService( -ThreadUtils.newDaemonCachedThreadPool("broadcast-exchange", 128)) +ThreadUtils.newDaemonCachedThreadPool("broadcast-exchange", + new SparkConf().get(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER))) Review comment: `SQLConf.get` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org