Github user rednaxelafx commented on the issue:
https://github.com/apache/spark/pull/22847
Just in case people wonder, the following is the hack patch that I used for
stress testing code splitting before this PR:
```diff
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
@@ -647,11 +647,13 @@ class CodegenContext(val useStreamlining: Boolean) {
* Returns a term name that is unique within this instance of a
`CodegenContext`.
*/
def freshName(name: String): String = synchronized {
- val fullName = if (freshNamePrefix == "") {
+ // hack: intentionally add a very long prefix (length=300 characters)
to
+ // trigger code splitting more frequently
+ val fullName = ("averylongprefix" * 20) + (if (freshNamePrefix == "") {
name
} else {
s"${freshNamePrefix}_$name"
- }
+ })
if (freshNameIds.contains(fullName)) {
val id = freshNameIds(fullName)
freshNameIds(fullName) = id + 1
```
Of course, now with this PR, we can simply set the split threshold to a
very low value (e.g. `1`) to force split.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]