[GitHub] [spark] ulysses-you commented on a diff in pull request #36856: [SPARK-39455][SQL] Improve expression non-codegen code path performance by cache data type matching
ulysses-you commented on code in PR #36856: URL: https://github.com/apache/spark/pull/36856#discussion_r896371376 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -53,6 +53,17 @@ case class UnaryMinus( override def toString: String = s"-$child" private lazy val numeric = TypeUtils.getNumeric(dataType, failOnError) + private lazy val unaryMinusFunc: Any => Any = dataType match { Review Comment: @srowen @LuciferYang This issue mostly happens in arithmetic like expression, and I also tried to collect other related expressions. Hope I'm not missing someone. We can still keep in mind for catching and reviewing the added expression or the new supported data type which may introduce this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a diff in pull request #36856: [SPARK-39455][SQL] Improve expression non-codegen code path performance by cache data type matching
ulysses-you commented on code in PR #36856: URL: https://github.com/apache/spark/pull/36856#discussion_r89692 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -53,6 +53,17 @@ case class UnaryMinus( override def toString: String = s"-$child" private lazy val numeric = TypeUtils.getNumeric(dataType, failOnError) + private lazy val unaryMinusFunc: Any => Any = dataType match { Review Comment: Do you mean rewrite it to ? ```scala private def unaryMinusFunc: Any => Any = dataType match { .. ``` The reason I pull out and make it as lazy val is: the data type is known before do eval in an expression. Let's say if the data type is integer, - then the function can be elimiated to `input => numeric.negate(input)` during execution - if you declare it as a function, then the function would be during execution: ```scala dataType match { case _ => input => numeric.negate(input) ``` The overhead is not about creating a function, but the data type matching elimination. So I think the lazy val is more efficient ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a diff in pull request #36856: [SPARK-39455][SQL] Improve expression non-codegen code path performance by cache data type matching
ulysses-you commented on code in PR #36856: URL: https://github.com/apache/spark/pull/36856#discussion_r89692 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -53,6 +53,17 @@ case class UnaryMinus( override def toString: String = s"-$child" private lazy val numeric = TypeUtils.getNumeric(dataType, failOnError) + private lazy val unaryMinusFunc: Any => Any = dataType match { Review Comment: Do you mean rewrite it to ? ```scala private def unaryMinusFunc: Any => Any = dataType match { .. ``` The reason I pull out and make it as lazy val is: the data type is known before do eval in an expression. Let's say if the data type is integer, - then the function can be elimiated to `input => numeric.negate(input)` during execution - if you declare it as a function, then the function would be during execution: ```scala dataType match { case _ => input => numeric.negate(input) ``` The overhead if not about creating a function, but the data type matching elimination. So I think the lazy val is more efficient ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a diff in pull request #36856: [SPARK-39455][SQL] Improve expression non-codegen code path performance by cache data type matching
ulysses-you commented on code in PR #36856: URL: https://github.com/apache/spark/pull/36856#discussion_r896301945 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -53,6 +53,17 @@ case class UnaryMinus( override def toString: String = s"-$child" private lazy val numeric = TypeUtils.getNumeric(dataType, failOnError) + private lazy val unaryMinusFunc: Any => Any = dataType match { Review Comment: hmm, this would be called inside `eval` so a cached function is expected to avoid invoke by every row -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org