subject:"\[GitHub\] \[spark\] ulysses\-you commented on a diff in pull request #36856\: \[SPARK\-39455\]\[SQL\] Improve expression non\-codegen code path performance by cache data type matching"

[GitHub] [spark] ulysses-you commented on a diff in pull request #36856: [SPARK-39455][SQL] Improve expression non-codegen code path performance by cache data type matching

2022-06-13 Thread GitBox



ulysses-you commented on code in PR #36856:
URL: https://github.com/apache/spark/pull/36856#discussion_r896371376


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -53,6 +53,17 @@ case class UnaryMinus(
   override def toString: String = s"-$child"
 
   private lazy val numeric = TypeUtils.getNumeric(dataType, failOnError)
+  private lazy val unaryMinusFunc: Any => Any = dataType match {

Review Comment:
   @srowen @LuciferYang  
   This issue mostly happens in arithmetic like expression, and I also tried to 
collect other related expressions. Hope I'm not missing someone. We can still 
keep in mind for catching and reviewing the added expression or the new 
supported data type which may introduce this issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a diff in pull request #36856: [SPARK-39455][SQL] Improve expression non-codegen code path performance by cache data type matching

2022-06-13 Thread GitBox



ulysses-you commented on code in PR #36856:
URL: https://github.com/apache/spark/pull/36856#discussion_r89692


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -53,6 +53,17 @@ case class UnaryMinus(
   override def toString: String = s"-$child"
 
   private lazy val numeric = TypeUtils.getNumeric(dataType, failOnError)
+  private lazy val unaryMinusFunc: Any => Any = dataType match {

Review Comment:
   Do you mean rewrite it to ?
   ```scala
   private def unaryMinusFunc: Any => Any = dataType match {
 ..
   ```
   
   The reason I pull out and make it as lazy val is: the data type is known 
before do eval in an expression. Let's say if the data type is integer,
   - then the function can be elimiated to `input  => numeric.negate(input)` 
during execution
   - if you declare it as a function, then the function would be during 
execution:
 ```scala
 dataType match {
   
   case _ => input => numeric.negate(input)
 ```
   
   The overhead is not about creating a function, but the data type matching 
elimination. So I think the lazy val is more efficient ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a diff in pull request #36856: [SPARK-39455][SQL] Improve expression non-codegen code path performance by cache data type matching

2022-06-13 Thread GitBox



ulysses-you commented on code in PR #36856:
URL: https://github.com/apache/spark/pull/36856#discussion_r89692


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -53,6 +53,17 @@ case class UnaryMinus(
   override def toString: String = s"-$child"
 
   private lazy val numeric = TypeUtils.getNumeric(dataType, failOnError)
+  private lazy val unaryMinusFunc: Any => Any = dataType match {

Review Comment:
   Do you mean rewrite it to ?
   ```scala
   private def unaryMinusFunc: Any => Any = dataType match {
 ..
   ```
   
   The reason I pull out and make it as lazy val is: the data type is known 
before do eval in an expression. Let's say if the data type is integer,
   - then the function can be elimiated to `input  => numeric.negate(input)` 
during execution
   - if you declare it as a function, then the function would be during 
execution:
 ```scala
 dataType match {
   
   case _ => input => numeric.negate(input)
 ```
   
   The overhead if not about creating a function, but the data type matching 
elimination. So I think the lazy val is more efficient ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a diff in pull request #36856: [SPARK-39455][SQL] Improve expression non-codegen code path performance by cache data type matching

2022-06-13 Thread GitBox



ulysses-you commented on code in PR #36856:
URL: https://github.com/apache/spark/pull/36856#discussion_r896301945


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -53,6 +53,17 @@ case class UnaryMinus(
   override def toString: String = s"-$child"
 
   private lazy val numeric = TypeUtils.getNumeric(dataType, failOnError)
+  private lazy val unaryMinusFunc: Any => Any = dataType match {

Review Comment:
   hmm, this would be called inside `eval` so a cached function is expected to 
avoid invoke by every row



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a diff in pull request #36856: [SPARK-39455][SQL] Improve expression non-codegen code path performance by cache data type matching

[GitHub] [spark] ulysses-you commented on a diff in pull request #36856: [SPARK-39455][SQL] Improve expression non-codegen code path performance by cache data type matching

[GitHub] [spark] ulysses-you commented on a diff in pull request #36856: [SPARK-39455][SQL] Improve expression non-codegen code path performance by cache data type matching

[GitHub] [spark] ulysses-you commented on a diff in pull request #36856: [SPARK-39455][SQL] Improve expression non-codegen code path performance by cache data type matching

4 matches

Site Navigation

Mail list logo

Footer information