[GitHub] [spark] sunchao commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-13 Thread GitBox


sunchao commented on a change in pull request #32527:
URL: https://github.com/apache/spark/pull/32527#discussion_r631953911



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
##
@@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression 
{
   arguments: Seq[Expression],
   input: InternalRow,
   dataType: DataType): Any = {
-val args = arguments.map(e => e.eval(input).asInstanceOf[Object])
-if (needNullCheck && args.exists(_ == null)) {
+var i = 0
+val len = arguments.length
+while (i < len) {
+  evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object]
+  i += 1
+}
+if (needNullCheck && evaluatedArgs.contains(null)) {
   // return null if one of arguments is null
   null
 } else {
   val ret = try {
-method.invoke(obj, args: _*)
+method.invoke(obj, evaluatedArgs: _*)
   } catch {

Review comment:
   Hmm I'm not sure. Looking at usages of `Invoke`, it seems 
`targetObject.dataType` is usually `ObjectType` (for instance, in 
`ScalarFunction` we wrap the UDF into a `Literal` with `ObjectType`), so 
curious how useful this would be and when we'd use `StringType`/`DecimalType` 
for the `targetObject`.
   
   Looking at the profiling result for `Invoke.eval`, it is now dominated by 
`InvokeLike.invoke`:
   
   https://user-images.githubusercontent.com/506679/118157789-d8183780-b3cf-11eb-92ae-bd9e39988c9c.png;>
   
   Although this is somewhat unrelated to the above as `V2FunctionBenchmark` 
(and `ScalarFunction`) uses `ObjectType` for `Invoke` so it's already handled 
by the current code:
   ```scala
 @transient lazy val method = targetObject.dataType match {
   case ObjectType(cls) =>
 Some(findMethod(cls, encodedFunctionName, argClasses))
   case _ => None
 }
   ```
   we may need new benchmarks if we decide to do this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


sunchao commented on a change in pull request #32527:
URL: https://github.com/apache/spark/pull/32527#discussion_r631576884



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
##
@@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression 
{
   arguments: Seq[Expression],
   input: InternalRow,
   dataType: DataType): Any = {
-val args = arguments.map(e => e.eval(input).asInstanceOf[Object])
-if (needNullCheck && args.exists(_ == null)) {
+var i = 0
+val len = arguments.length
+while (i < len) {
+  evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object]
+  i += 1
+}
+if (needNullCheck && evaluatedArgs.contains(null)) {
   // return null if one of arguments is null
   null
 } else {
   val ret = try {
-method.invoke(obj, args: _*)
+method.invoke(obj, evaluatedArgs: _*)
   } catch {

Review comment:
   I'm not sure if we can do the similar thing in `Invoke.eval` though 
since `obj` in `obj.getClass.getMethod(functionName, argClasses: _*)` is 
different for each call.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


sunchao commented on a change in pull request #32527:
URL: https://github.com/apache/spark/pull/32527#discussion_r631565642



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
##
@@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression 
{
   arguments: Seq[Expression],
   input: InternalRow,
   dataType: DataType): Any = {
-val args = arguments.map(e => e.eval(input).asInstanceOf[Object])
-if (needNullCheck && args.exists(_ == null)) {
+var i = 0
+val len = arguments.length
+while (i < len) {
+  evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object]
+  i += 1
+}
+if (needNullCheck && evaluatedArgs.contains(null)) {
   // return null if one of arguments is null
   null
 } else {
   val ret = try {
-method.invoke(obj, args: _*)
+method.invoke(obj, evaluatedArgs: _*)
   } catch {

Review comment:
   Yea let me try it. In the profiling after this PR, `HashMap.get` takes 
7.82% from the entire `invoke` call so it seems worthwhile to do this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


sunchao commented on a change in pull request #32527:
URL: https://github.com/apache/spark/pull/32527#discussion_r631540561



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
##
@@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression 
{
   arguments: Seq[Expression],
   input: InternalRow,
   dataType: DataType): Any = {
-val args = arguments.map(e => e.eval(input).asInstanceOf[Object])
-if (needNullCheck && args.exists(_ == null)) {
+var i = 0
+val len = arguments.length
+while (i < len) {
+  evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object]

Review comment:
   Yea even though we evaluate `arguments` for each `invoke` call we can 
reuse the same array to store the results of evaluation. I guess it's better 
than allocating a new `Array[Object]` for each input row.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


sunchao commented on a change in pull request #32527:
URL: https://github.com/apache/spark/pull/32527#discussion_r631523221



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
##
@@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression 
{
   arguments: Seq[Expression],
   input: InternalRow,
   dataType: DataType): Any = {
-val args = arguments.map(e => e.eval(input).asInstanceOf[Object])
-if (needNullCheck && args.exists(_ == null)) {
+var i = 0
+val len = arguments.length
+while (i < len) {
+  evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object]

Review comment:
   You mean just use val?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke

2021-05-12 Thread GitBox


sunchao commented on a change in pull request #32527:
URL: https://github.com/apache/spark/pull/32527#discussion_r631501337



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
##
@@ -127,13 +128,19 @@ trait InvokeLike extends Expression with NonSQLExpression 
{
   arguments: Seq[Expression],
   input: InternalRow,
   dataType: DataType): Any = {
-val args = arguments.map(e => e.eval(input).asInstanceOf[Object])
-if (needNullCheck && args.exists(_ == null)) {
+var i = 0
+val len = arguments.length
+while (i < len) {
+  val e = arguments(i)

Review comment:
   yea let me remove it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org