[GitHub] spark pull request #21860: [SPARK-24901][SQL]Merge the codegen of RegularHas...

cloud-fan Tue, 30 Oct 2018 05:21:02 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21860#discussion_r229283365
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
    @@ -854,33 +862,50 @@ case class HashAggregateExec(
     
         val updateRowInHashMap: String = {
           if (isFastHashMapEnabled) {
    -        ctx.INPUT_ROW = fastRowBuffer
    -        val boundUpdateExpr = 
updateExpr.map(BindReferences.bindReference(_, inputAttr))
    -        val subExprs = 
ctx.subexpressionEliminationForWholeStageCodegen(boundUpdateExpr)
    -        val effectiveCodes = subExprs.codes.mkString("\n")
    -        val fastRowEvals = 
ctx.withSubExprEliminationExprs(subExprs.states) {
    -          boundUpdateExpr.map(_.genCode(ctx))
    -        }
    -        val updateFastRow = fastRowEvals.zipWithIndex.map { case (ev, i) =>
    -          val dt = updateExpr(i).dataType
    -          CodeGenerator.updateColumn(
    -            fastRowBuffer, dt, i, ev, updateExpr(i).nullable, 
isVectorizedHashMapEnabled)
    -        }
    +        if (isVectorizedHashMapEnabled) {
    +          ctx.INPUT_ROW = fastRowBuffer
    +          val boundUpdateExpr = 
updateExpr.map(BindReferences.bindReference(_, inputAttr))
    +          val subExprs = 
ctx.subexpressionEliminationForWholeStageCodegen(boundUpdateExpr)
    +          val effectiveCodes = subExprs.codes.mkString("\n")
    +          val fastRowEvals = 
ctx.withSubExprEliminationExprs(subExprs.states) {
    +            boundUpdateExpr.map(_.genCode(ctx))
    +          }
    +          val updateFastRow = fastRowEvals.zipWithIndex.map { case (ev, i) 
=>
    +            val dt = updateExpr(i).dataType
    +            CodeGenerator.updateColumn(
    +              fastRowBuffer, dt, i, ev, updateExpr(i).nullable, 
isVectorized = true)
    +          }
     
    -        // If fast hash map is on, we first generate code to update row in 
fast hash map, if the
    -        // previous loop up hit fast hash map. Otherwise, update row in 
regular hash map.
    -        s"""
    -           |if ($fastRowBuffer != null) {
    -           |  // common sub-expressions
    -           |  $effectiveCodes
    -           |  // evaluate aggregate function
    -           |  ${evaluateVariables(fastRowEvals)}
    -           |  // update fast row
    -           |  ${updateFastRow.mkString("\n").trim}
    -           |} else {
    -           |  $updateRowInRegularHashMap
    -           |}
    -       """.stripMargin
    +          // If vectorized fast hash map is on, we first generate code to 
update row
    +          // in vectorized fast hash map, if the previous loop up hit 
vectorized fast hash map.
    +          // Otherwise, update row in regular hash map.
    +          s"""
    +             |if ($fastRowBuffer != null) {
    +             |  // common sub-expressions
    +             |  $effectiveCodes
    +             |  // evaluate aggregate function
    +             |  ${evaluateVariables(fastRowEvals)}
    +             |  // update fast row
    +             |  ${updateFastRow.mkString("\n").trim}
    +             |} else {
    +             |  $updateRowInRegularHashMap
    +             |}
    +          """.stripMargin
    +        } else {
    +          // If row-based hash map is on and the previous loop up hit fast 
hash map,
    +          // we reuse regular hash buffer to update row of fast hash map.
    +          // Otherwise, update row in regular hash map.
    +          s"""
    +             |// Updates the proper row buffer
    +             |UnsafeRow $updatedAggBuffer = null;
    --- End diff --
    
    OK now I understand what's going on here. I still think we don't need this 
variable. We can generate
    ```
    if ($fastRowBuffer != null) {
      $unsafeRowBuffer = $fastRowBuffer
    }
    $updateRowInRegularHashMap
    ```
    And then we don't need to change `updateRowInRegularHashMap`.
    
    Note that, the readability of the Scala code is more important than the 
readability of the generated java code.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21860: [SPARK-24901][SQL]Merge the codegen of RegularHas...

Reply via email to