[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-27 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r251294071
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
 ##
 @@ -126,4 +126,13 @@ object StaticSQLConf {
   .intConf
   .createWithDefault(1000)
 
+  val MAX_BROADCAST_EXCHANGE_THREADNUMBER =
+buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber")
+  .doc("The maximum degree of parallelism to fetch and broadcast the 
table.If we " +
 
 Review comment:
   If that's around this code and that's the only the one, yea, let's do that. 
If there are multiple across this files, let's don't include.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-20 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r249318022
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
 ##
 @@ -126,4 +126,13 @@ object StaticSQLConf {
   .intConf
   .createWithDefault(1000)
 
+  val MAX_BROADCAST_EXCHANGE_THREADNUMBER =
+buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber")
+  .doc("The maximum degree of parallelism to fetch and broadcast the 
table.If we " +
+"encounter memory issue when broadcast table we can decrease this 
number." +
 
 Review comment:
   `this number in order to ...`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-20 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r249317993
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
 ##
 @@ -126,4 +126,13 @@ object StaticSQLConf {
   .intConf
   .createWithDefault(1000)
 
+  val MAX_BROADCAST_EXCHANGE_THREADNUMBER =
+buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber")
+  .doc("The maximum degree of parallelism to fetch and broadcast the 
table.If we " +
+"encounter memory issue when broadcast table we can decrease this 
number." +
 
 Review comment:
   `memory issue`: can you elaborate which memory issue here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-20 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r249317928
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
 ##
 @@ -126,4 +126,13 @@ object StaticSQLConf {
   .intConf
   .createWithDefault(1000)
 
+  val MAX_BROADCAST_EXCHANGE_THREADNUMBER =
+buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber")
+  .doc("The maximum degree of parallelism to fetch and broadcast the 
table.If we " +
+"encounter memory issue when broadcast table we can decrease this 
number." +
+"Notice the number should be carefully chosen since decrease 
parallelism will " +
+"cause longer waiting for other broadcasting.And increase parallelism 
may " +
 
 Review comment:
   `broadcasting.And` -> `broadcasting. Also, increasing`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-20 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r249317928
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
 ##
 @@ -126,4 +126,13 @@ object StaticSQLConf {
   .intConf
   .createWithDefault(1000)
 
+  val MAX_BROADCAST_EXCHANGE_THREADNUMBER =
+buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber")
+  .doc("The maximum degree of parallelism to fetch and broadcast the 
table.If we " +
+"encounter memory issue when broadcast table we can decrease this 
number." +
+"Notice the number should be carefully chosen since decrease 
parallelism will " +
+"cause longer waiting for other broadcasting.And increase parallelism 
may " +
 
 Review comment:
   `broadcasting.And` -> `broadcasting. Also, incresing`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-20 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r249317902
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
 ##
 @@ -126,4 +126,13 @@ object StaticSQLConf {
   .intConf
   .createWithDefault(1000)
 
+  val MAX_BROADCAST_EXCHANGE_THREADNUMBER =
+buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber")
+  .doc("The maximum degree of parallelism to fetch and broadcast the 
table.If we " +
+"encounter memory issue when broadcast table we can decrease this 
number." +
+"Notice the number should be carefully chosen since decrease 
parallelism will " +
 
 Review comment:
   `will` -> `might`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-20 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r249317880
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
 ##
 @@ -126,4 +126,13 @@ object StaticSQLConf {
   .intConf
   .createWithDefault(1000)
 
+  val MAX_BROADCAST_EXCHANGE_THREADNUMBER =
+buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber")
+  .doc("The maximum degree of parallelism to fetch and broadcast the 
table.If we " +
+"encounter memory issue when broadcast table we can decrease this 
number." +
+"Notice the number should be carefully chosen since decrease 
parallelism will " +
 
 Review comment:
   `decrease ` -> `decreasing`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-20 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r249317812
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
 ##
 @@ -126,4 +126,13 @@ object StaticSQLConf {
   .intConf
   .createWithDefault(1000)
 
+  val MAX_BROADCAST_EXCHANGE_THREADNUMBER =
+buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber")
+  .doc("The maximum degree of parallelism to fetch and broadcast the 
table.If we " +
 
 Review comment:
   extra sapce -> `table. If`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-20 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r249317752
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala
 ##
 @@ -132,4 +135,33 @@ class ExchangeSuite extends SparkPlanTest with 
SharedSQLContext {
 val projection2 = cached.select("_1", "_3").queryExecution.executedPlan
 assert(!projection1.sameResult(projection2))
   }
+
+  test("SPARK-26601: Make broadcast-exchange thread pool configurable") {
+val previousNumber = SparkSession.getActiveSession.get.sparkContext.conf
+  .get(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER)
+
+SparkSession.getActiveSession.get.sparkContext.conf.
+  set(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER, 1)
+
assert(SQLConf.get.getConf(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER) 
=== 1)
+
+Future {
+  Thread.sleep(5*1000)
+} (BroadcastExchangeExec.executionContext)
+
+val f = Future {} (BroadcastExchangeExec.executionContext)
 
 Review comment:
   You don't have to test Java's thread executors. Can you just check if 
`BroadcastExchangeExec.executionContext .getMaximumPoolSize` is as configured?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-17 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r248621438
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala
 ##
 @@ -132,4 +136,29 @@ class ExchangeSuite extends SparkPlanTest with 
SharedSQLContext {
 val projection2 = cached.select("_1", "_3").queryExecution.executedPlan
 assert(!projection1.sameResult(projection2))
   }
+
+  test("SPARK-26601: Make broadcast-exchange thread pool configurable") {
+val sparkConf = new SparkConf()
+  .set(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER.key, "1")
+  .set("spark.driver.allowMultipleContexts", "true")
+val tss = new TestSparkSession(sparkConf)
+SparkSession.setActiveSession(tss)
+
assert(SQLConf.get.getConf(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER) 
=== 1)
+
+Future {
+  Thread.sleep(5*1000)
+}(BroadcastExchangeExec.executionContext)
 
 Review comment:
   For style, please take a look for 
https://github.com/databricks/scala-style-guide


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-17 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r248621249
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala
 ##
 @@ -132,4 +136,29 @@ class ExchangeSuite extends SparkPlanTest with 
SharedSQLContext {
 val projection2 = cached.select("_1", "_3").queryExecution.executedPlan
 assert(!projection1.sameResult(projection2))
   }
+
+  test("SPARK-26601: Make broadcast-exchange thread pool configurable") {
+val sparkConf = new SparkConf()
+  .set(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER.key, "1")
+  .set("spark.driver.allowMultipleContexts", "true")
 
 Review comment:
   `allowMultipleContexts` this is removed out as of 
https://github.com/apache/spark/pull/23311


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-14 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r247748689
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala
 ##
 @@ -132,4 +132,12 @@ class ExchangeSuite extends SparkPlanTest with 
SharedSQLContext {
 val projection2 = cached.select("_1", "_3").queryExecution.executedPlan
 assert(!projection1.sameResult(projection2))
   }
+
+  test("SPARK-26601: Make broadcast-exchange thread pool configurable") {
+
+withSQLConf(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER.key -> "1") {
+  
assert(SQLConf.get.getConf(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER) 
=== 1)
+  assert(BroadcastExchangeExec.executionContext != null)
 
 Review comment:
   `BroadcastExchangeExec.executionContext` can't be null. Looks it doesn;t 
properly test the change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-14 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r247748426
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
 ##
 @@ -126,4 +126,10 @@ object StaticSQLConf {
   .intConf
   .createWithDefault(1000)
 
+  val MAX_BROADCAST_EXCHANGE_THREADNUMBER =
+buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber")
+  .doc("MAX number of threads can hold by BroadcastExchangeExec which 
controls" +
 
 Review comment:
   Still, users won't know what's `BroadcastExchangeExec` and `MAX number of 
threads`. Please do your diligence for this doc. Also, we might have to 
describe when to increase and when to decrease.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-12 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r247333148
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
 ##
 @@ -126,4 +126,10 @@ object StaticSQLConf {
   .intConf
   .createWithDefault(1000)
 
+  val MAX_BROADCAST_EXCHANGE_THREADNUMBER =
+buildStaticConf("spark.sql.broadcastExchange.maxThreadNumber")
+  .doc("MAX number of threads can hold by BroadcastExchangeExec.")
 
 Review comment:
   Can you elaborate this in the doc about what this number controls? For 
instance, it controls the parallelism of fetching and broadcasting the table.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] Make broadcast-exchange thread pool configurable

2019-01-12 Thread GitBox
HyukjinKwon commented on a change in pull request #23519: [SPARK-26601][SQL] 
Make broadcast-exchange thread pool configurable
URL: https://github.com/apache/spark/pull/23519#discussion_r247333106
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 ##
 @@ -157,5 +157,6 @@ case class BroadcastExchangeExec(
 
 object BroadcastExchangeExec {
   private[execution] val executionContext = 
ExecutionContext.fromExecutorService(
-ThreadUtils.newDaemonCachedThreadPool("broadcast-exchange", 128))
+ThreadUtils.newDaemonCachedThreadPool("broadcast-exchange",
+  new SparkConf().get(StaticSQLConf.MAX_BROADCAST_EXCHANGE_THREADNUMBER)))
 
 Review comment:
   `SQLConf.get`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org