Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20224#discussion_r163453639
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
    @@ -228,4 +229,21 @@ class WholeStageCodegenSuite extends QueryTest with 
SharedSQLContext {
           }
         }
       }
    +
    +  test("including codegen stage ID in generated class name should not 
regress codegen caching") {
    +    import testImplicits._
    +
    +    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_USE_ID_IN_CLASS_NAME.key -> 
"true") {
    +      val bytecodeSizeHisto = 
CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE
    +      spark.range(3).select('id + 2).collect
    +      val after1 = bytecodeSizeHisto.getCount
    +      spark.range(3).select('id + 2).collect
    +      val after2 = bytecodeSizeHisto.getCount // same query shape as 
above, deliberately
    +      assert(after1 == after2, "the same query run twice should hit the 
codegen cache")
    +
    +      spark.range(5).select('id * 2).collect
    +      val after3 = bytecodeSizeHisto.getCount
    +      assert(after3 >= after2, "a different query can result in codegen 
cache miss, that's okay")
    --- End diff --
    
    I actually deliberately wrote it this way. Note how I phrased in the 
assertion message as "can result in codegen cache miss" instead of "will result 
in".
    
    That's because the code shape of this third query was deliberately chosen 
to be similar to the two queries before it: all three have 
`spark.range(some_const).select(some_expr).collect`, so if any future changes 
to codegen of `Range` or `Project` operators affect how much specialized code 
(such as constant values) we directly embed into the code, it's actually 
possible to this third query to generate the same code as the first two, which 
will result in a codegen cache miss.
    
    So I'm making this check a bit loose. It's just there to indicate that it's 
acceptable to for a different query to encounter a codegen cache miss. WYDT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to