Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r154866923 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.codegen + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.expressions._ + +/** + * Defines APIs used in expression code generation. + */ +object ExpressionCodegen { + + /** + * Given an expression, returns the all necessary parameters to evaluate it, so the generated + * code of this expression can be split in a function. + * The 1st string in returned tuple is the parameter strings used to call the function. + * The 2nd string in returned tuple is the parameter strings used to declare the function. + * + * Returns `None` if it can't produce valid parameters. + * + * Params to include: + * 1. Evaluated columns referred by this, children or deferred expressions. + * 2. Rows referred by this, children or deferred expressions. + * 3. Eliminated subexpressions referred bu children expressions. + */ + def getExpressionInputParams( + ctx: CodegenContext, + expr: Expression): Option[(Seq[String], Seq[String])] = { + val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr) + val paramsFromColumns = prepareFunctionParams(ctx, inputAttrs, inputVars) + + val subExprs = getSubExprInChildren(ctx, expr) + val subExprCodes = getSubExprCodes(ctx, subExprs) + val paramsFromSubExprs = prepareFunctionParams(ctx, subExprs, subExprCodes) --- End diff -- If I get rid of the part of extracting subexpressions as parameters, some tests will be failed. Because hash aggregation uses subexpression elimination under wholestage codegen, this PR enables splitting expression codes under wholestage codegen, so it inevitably needs to include subexpression parameters. This PR doesn't yet to support `splitExpressions` in wholestage codegen. This is only applied to split codes of deeply nested expression now. But the added API can be applied to `splitExpressions` later. That is a TODO.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org