Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19813#discussion_r154866923
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
 ---
    @@ -0,0 +1,237 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.expressions.codegen
    +
    +import scala.collection.mutable
    +
    +import org.apache.spark.sql.catalyst.expressions._
    +
    +/**
    + * Defines APIs used in expression code generation.
    + */
    +object ExpressionCodegen {
    +
    +  /**
    +   * Given an expression, returns the all necessary parameters to evaluate 
it, so the generated
    +   * code of this expression can be split in a function.
    +   * The 1st string in returned tuple is the parameter strings used to 
call the function.
    +   * The 2nd string in returned tuple is the parameter strings used to 
declare the function.
    +   *
    +   * Returns `None` if it can't produce valid parameters.
    +   *
    +   * Params to include:
    +   * 1. Evaluated columns referred by this, children or deferred 
expressions.
    +   * 2. Rows referred by this, children or deferred expressions.
    +   * 3. Eliminated subexpressions referred bu children expressions.
    +   */
    +  def getExpressionInputParams(
    +      ctx: CodegenContext,
    +      expr: Expression): Option[(Seq[String], Seq[String])] = {
    +    val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr)
    +    val paramsFromColumns = prepareFunctionParams(ctx, inputAttrs, 
inputVars)
    +
    +    val subExprs = getSubExprInChildren(ctx, expr)
    +    val subExprCodes = getSubExprCodes(ctx, subExprs)
    +    val paramsFromSubExprs = prepareFunctionParams(ctx, subExprs, 
subExprCodes)
    --- End diff --
    
    If I get rid of the part of extracting subexpressions as parameters, some 
tests will be failed.
    
    Because hash aggregation uses subexpression elimination under wholestage 
codegen, this PR enables splitting expression codes under wholestage codegen, 
so it inevitably needs to include subexpression parameters.
    
    This PR doesn't yet to support `splitExpressions` in wholestage codegen. 
This is only applied to split codes of deeply nested expression now. But the 
added API can be applied to `splitExpressions` later. That is a TODO.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to