[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

rednaxelafx Wed, 31 Oct 2018 22:34:39 -0700

Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Just in case people wonder, the following is the hack patch that I used for 
stress testing code splitting before this PR:
    ```diff
    --- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
    +++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
    @@ -647,11 +647,13 @@ class CodegenContext(val useStreamlining: Boolean) {
        * Returns a term name that is unique within this instance of a 
`CodegenContext`.
        */
       def freshName(name: String): String = synchronized {
    -    val fullName = if (freshNamePrefix == "") {
    +    // hack: intentionally add a very long prefix (length=300 characters) 
to
    +    // trigger code splitting more frequently
    +    val fullName = ("averylongprefix" * 20) + (if (freshNamePrefix == "") {
           name
         } else {
           s"${freshNamePrefix}_$name"
    -    }
    +    })
         if (freshNameIds.contains(fullName)) {
           val id = freshNameIds(fullName)
           freshNameIds(fullName) = id + 1
    ```
    Of course, now with this PR, we can simply set the split threshold to a 
very low value (e.g. `1`) to force split.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Reply via email to