[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

rednaxelafx Wed, 31 Jan 2018 15:36:11 -0800

Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/20419
  
    I did the experiment that I asked about at 
https://github.com/apache/spark/pull/20419#issuecomment-361359825 , and 
verified that under the current implementation, this PR will not affect the 
codegen cache hit behavior.
    
    But the way it's working is a bit brittle. The `body` field of the 
`CodeAndComment` returned from `WholeStageCodegenExec.doCodeGen()` looks like 
the following, when `spark.sql.codegen.comments=false && 
spark.sql.codegen.useIdInClassName=false`:
    ```java
    /*wholestagecodegen_c*/
    final class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
    ```
    Notice the `/*wholestagecodegen_c*/` placeholder comment. It's the artifact 
produced by the `ctx.registerComment()` call, where `wholestagecodegen_` is a 
prefix automatically derived from the calling class, and `c` is from a 
`ctx.freshName("c")`, subject to an integer suffix when names collide.
    
    The way it's used right now, because from `WholeStageCodegenExec` we're 
only making very few calls to `ctx.registerComment()` and all of them happen in 
a deterministic order, that's how this identifier embedded in the placeholder 
comment would be the same no matter what codegen stage it is.
    
    But let's assume if someone doesn't know about this, and then accidentally 
added code in `WholeStageCodegenExec` that conditionally calls 
`ctx.registerComment()`, then this placeholder comment is NOT guaranteed to be 
the same, and that will affect the codegen cache hit behavior.
    
    I'd say functional wise this PR is good to go; maybe we want to add a 
comment somewhere in `WholeStageCodegenExec` to note the subtlety, or maybe 
think about an alternative sometime in the future.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

Reply via email to