Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20419 I did the experiment that I asked about at https://github.com/apache/spark/pull/20419#issuecomment-361359825 , and verified that under the current implementation, this PR will not affect the codegen cache hit behavior. But the way it's working is a bit brittle. The `body` field of the `CodeAndComment` returned from `WholeStageCodegenExec.doCodeGen()` looks like the following, when `spark.sql.codegen.comments=false && spark.sql.codegen.useIdInClassName=false`: ```java /*wholestagecodegen_c*/ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { ``` Notice the `/*wholestagecodegen_c*/` placeholder comment. It's the artifact produced by the `ctx.registerComment()` call, where `wholestagecodegen_` is a prefix automatically derived from the calling class, and `c` is from a `ctx.freshName("c")`, subject to an integer suffix when names collide. The way it's used right now, because from `WholeStageCodegenExec` we're only making very few calls to `ctx.registerComment()` and all of them happen in a deterministic order, that's how this identifier embedded in the placeholder comment would be the same no matter what codegen stage it is. But let's assume if someone doesn't know about this, and then accidentally added code in `WholeStageCodegenExec` that conditionally calls `ctx.registerComment()`, then this placeholder comment is NOT guaranteed to be the same, and that will affect the codegen cache hit behavior. I'd say functional wise this PR is good to go; maybe we want to add a comment somewhere in `WholeStageCodegenExec` to note the subtlety, or maybe think about an alternative sometime in the future.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org