[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-11-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22847


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-11-01 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r230248773
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
+  "be generated, so use the code length as metric. A function's 
bytecode should not go " +
+  "beyond 8KB, otherwise it will not be JITted; it also should not be 
too small, otherwise " +
+  "there will be many function calls.")
+.intConf
--- End diff --

@rednaxelafx the wide table benchmark I used has 400 columns, whole stage 
codegen is disabled by default.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-31 Thread rednaxelafx
Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r229943260
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
+  "be generated, so use the code length as metric. A function's 
bytecode should not go " +
+  "beyond 8KB, otherwise it will not be JITted; it also should not be 
too small, otherwise " +
+  "there will be many function calls.")
+.intConf
--- End diff --

Oh I see, you're using the column name...that's not the right place to put 
the "prefix". Column names are almost never carried over to the generated code 
in the current framework (the only exception is the lambda variable name).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-31 Thread rednaxelafx
Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r229942325
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
+  "be generated, so use the code length as metric. A function's 
bytecode should not go " +
+  "beyond 8KB, otherwise it will not be JITted; it also should not be 
too small, otherwise " +
+  "there will be many function calls.")
+.intConf
--- End diff --

The "freshNamePrefix" prefix is only applied in whole-stage codegen, 

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L87

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L169

It doesn't take any effect in non-whole-stage codegen.

If you intend to stress test expression codegen but don't see the prefix 
being prepended, you're probably not adding it in the right place. Where did 
you add it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-31 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r229919857
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
+  "be generated, so use the code length as metric. A function's 
bytecode should not go " +
+  "beyond 8KB, otherwise it will not be JITted; it also should not be 
too small, otherwise " +
+  "there will be many function calls.")
+.intConf
--- End diff --

Seems like long alias names have no influence.
```
[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6
[info] Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
[info] projection on wide table:Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
[info] 

[info] split threshold 106512 / 6736  
0.26210.4   1.0X
[info] split threshold 100   5730 / 6329  
0.25464.9   1.1X
[info] split threshold 1024  3119 / 3184  
0.32974.6   2.1X
[info] split threshold 2048  2981 / 3100  
0.42842.9   2.2X
[info] split threshold 4096  3289 / 3379  
0.33136.6   2.0X
[info] split threshold 8196  4307 / 4338  
0.24108.0   1.5X
[info] split threshold 65536   29147 / 30212  
0.0   27797.0   0.2X
```

No `averylongprefixrepeatedmultipletimes` in the **expression code gen**:
```
/* 047 */   private void createExternalRow_0_8(InternalRow i, Object[] 
values_0) {
/* 048 */
/* 049 */ // input[80, bigint, false]
/* 050 */ long value_81 = i.getLong(80);
/* 051 */ if (false) {
/* 052 */   values_0[80] = null;
/* 053 */ } else {
/* 054 */   values_0[80] = value_81;
/* 055 */ }
/* 056 */
/* 057 */ // input[81, bigint, false]
/* 058 */ long value_82 = i.getLong(81);
/* 059 */ if (false) {
/* 060 */   values_0[81] = null;
/* 061 */ } else {
/* 062 */   values_0[81] = value_82;
/* 063 */ }
/* 064 */
/* 065 */ // input[82, bigint, false]
/* 066 */ long value_83 = i.getLong(82);
/* 067 */ if (false) {
/* 068 */   values_0[82] = null;
/* 069 */ } else {
/* 070 */   values_0[82] = value_83;
/* 071 */ }
/* 072 */
...
```

My benchmark:
```
object WideTableBenchmark extends SqlBasedBenchmark {

  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
runBenchmark("projection on wide table") {
  val N = 1 << 20
  val df = spark.range(N)
  val columns = (0 until 400).map{ i => s"id as 
averylongprefixrepeatedmultipletimes_id$i"}
  val benchmark = new Benchmark("projection on wide table", N, output = 
output)
  Seq("10", "100", "1024", "2048", "4096", "8196", "65536").foreach { n 
=>
benchmark.addCase(s"split threshold $n", numIters = 5) { iter =>
  withSQLConf("spark.testing.codegen.splitThreshold" -> n) {
df.selectExpr(columns: _*).foreach(identity(_))
  }
}
  }
  benchmark.run()
}
  }
}
```

Will keep benchmarking for the complex expression.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-31 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r229879855
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
+  "be generated, so use the code length as metric. A function's 
bytecode should not go " +
+  "beyond 8KB, otherwise it will not be JITted; it also should not be 
too small, otherwise " +
+  "there will be many function calls.")
+.intConf
--- End diff --

Could you try some very long alias names or complex expressions? You will 
get different number, right? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-31 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r229638917
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
+  "be generated, so use the code length as metric. A function's 
bytecode should not go " +
+  "beyond 8KB, otherwise it will not be JITted; it also should not be 
too small, otherwise " +
+  "there will be many function calls.")
+.intConf
--- End diff --

@kiszk agree, `1000` might be not the best, see my benchmark for the wide 
table, `2048` is better.
```


projection on wide table


 Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
projection on wide table:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


split threshold 108464 / 8737  0.1  
  8072.0   1.0X
split threshold 100   5959 / 6251  0.2  
  5683.4   1.4X
split threshold 1024  3202 / 3248  0.3  
  3053.2   2.6X
split threshold 2048  3009 / 3097  0.3  
  2869.2   2.8X
split threshold 4096  3414 / 3458  0.3  
  3256.1   2.5X
split threshold 8196  4095 / 4112  0.3  
  3905.5   2.1X
split threshold 65536   28800 / 29705  0.0  
 27465.8   0.3X
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-31 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r229577559
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,18 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source-code splitting in the codegen. When the 
number of characters " +
+  "in a single JAVA function (without comment) exceeds the threshold, 
the function will be " +
--- End diff --

nit: `JAVA` -> `Java`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-31 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r229577345
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
+  "be generated, so use the code length as metric. A function's 
bytecode should not go " +
+  "beyond 8KB, otherwise it will not be JITted; it also should not be 
too small, otherwise " +
+  "there will be many function calls.")
+.intConf
--- End diff --

1000 is conservative. But, there is no recommendation since the bytecode 
size depends on the content (e.g. `0`'s byte code length is 1. `9`'s byte code 
lengh is 2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-31 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r229572426
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,18 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We cannot know 
how many bytecode will " +
--- End diff --

`The threshold of source-code splitting in the codegen. When the number of 
characters in a single JAVA function (without comment) exceeds the threshold, 
the function will be automatically split to multiple smaller ones.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r229538148
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We cannot know 
how many bytecode will " +
+  "be generated, so use the code length as metric. When running on 
HotSpot, a function's " +
+  "bytecode should not go beyond 8KB, otherwise it will not be JITted; 
it also should not " +
+  "be too small, otherwise there will be many function calls.")
+.intConf
+.createWithDefault(1024)
--- End diff --

let's add a check value to make sure the value is positive. We can figure 
out a lower and upper bound later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r229478558
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
+  "be generated, so use the code length as metric. A function's 
bytecode should not go " +
+  "beyond 8KB, otherwise it will not be JITted; it also should not be 
too small, otherwise " +
+  "there will be many function calls.")
+.intConf
--- End diff --

Yep. That could be `max`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r228789484
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
--- End diff --

Not a big deal but I would avoid abbreviation in documentation. `can't` -> 
`cannot`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-28 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r228788123
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
+  "be generated, so use the code length as metric. A function's 
bytecode should not go " +
+  "beyond 8KB, otherwise it will not be JITted; it also should not be 
too small, otherwise " +
+  "there will be many function calls.")
+.intConf
--- End diff --

To be more accurately, I think I should add `When running on HotSpot, a 
function's bytecode should not go beyond 8KB...`.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-28 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r228755710
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
+  "be generated, so use the code length as metric. A function's 
bytecode should not go " +
+  "beyond 8KB, otherwise it will not be JITted; it also should not be 
too small, otherwise " +
+  "there will be many function calls.")
+.intConf
--- End diff --

According to the description, it seems that we had better have `checkValue` 
here. Could you recommend the reasonable min/max values, @kiszk ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-26 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r228600757
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The maximum source code length of a single Java function by 
codegen. When the " +
+  "generated Java function source code exceeds this threshold, it will 
be split into " +
+  "multiple small functions, each function length is 
spark.sql.codegen.methodSplitThreshold." +
+  " A function's bytecode should not go beyond 8KB, otherwise it will 
not be JITted, should " +
+  "also not be too small, or we will have many function calls. We 
can't know how many " +
--- End diff --

it will not be JITted; it also should not not be too small, otherwise there 
will be many function calls. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-26 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r228598058
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The maximum source code length of a single Java function by 
codegen. When the " +
+  "generated Java function source code exceeds this threshold, it will 
be split into " +
+  "multiple small functions, each function length is 
spark.sql.codegen.methodSplitThreshold." +
--- End diff --

IMHO, `spark.sql.codegen.methodSplitThreshold` can be used, but the 
description should be changed. For example,
`The threshold of source code length without comment of a single Java 
function by codegen to be split. When the generated Java function source code 
exceeds this threshold, it will be split into multiple small functions. ...`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r228483780
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The maximum source code length of a single Java function by 
codegen. When the " +
+  "generated Java function source code exceeds this threshold, it will 
be split into " +
+  "multiple small functions, each function length is 
spark.sql.codegen.methodSplitThreshold." +
--- End diff --

`each function length is spark.sql.codegen.methodSplitThreshold` this is 
not true, the method size is always larger than the threshold. cc @kiszk any 
idea about the naming and description of this config?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-26 Thread yucai
GitHub user yucai opened a pull request:

https://github.com/apache/spark/pull/22847

[SPARK-25850][SQL] Make the split threshold for the code generated method 
configurable

## What changes were proposed in this pull request?
As per the 
[discussion](https://github.com/apache/spark/pull/22823/files#r228400706), add 
a new configuration to make the split threshold for the code generated method 
configurable.

When the generated Java function source code exceeds the split threshold, 
it will be split into multiple small functions, each function length is 
spark.sql.codegen.methodSplitThreshold.

## How was this patch tested?
manual tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yucai/spark splitThreshold

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22847.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22847


commit 188a9476e3504c151ebe27b362a080469c262674
Author: yucai 
Date:   2018-10-26T08:00:24Z

[SPARK-25850][SQL] Make the split threshold for the code generated method 
configurable




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org