This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push: new afd99d19a2b8 [SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 2.12 afd99d19a2b8 is described below commit afd99d19a2b85dda2245d3557506d1090187c5f4 Author: Zhen Wang <643348...@qq.com> AuthorDate: Fri Apr 19 10:53:16 2024 +0800 [SPARK-47897][SQL][3.5] Fix ExpressionSet performance regression in scala 2.12 ### What changes were proposed in this pull request? Fix `ExpressionSet` performance regression in scala 2.12. ### Why are the changes needed? The implementation of the `SetLike.++` method in scala 2.12 is to iteratively execute the `+` method. The `ExpressionSet.+` method first clones a new object and then adds element, which is very expensive. https://github.com/scala/scala/blob/ceaf7e68ac93e9bbe8642d06164714b2de709c27/src/library/scala/collection/SetLike.scala#L186 After https://github.com/apache/spark/pull/36121, the `++` and `--` methods in ExpressionSet of scala 2.12 were removed, causing performance regression. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Benchmark code: ``` object TestBenchmark { def main(args: Array[String]): Unit = { val count = 300 val benchmark = new Benchmark("Test ExpressionSetV2 ++ ", count) val aUpper = AttributeReference("A", IntegerType)(exprId = ExprId(1)) var initialSet = ExpressionSet((0 until 300).map(i => aUpper + i)) val setToAddWithSameDeterministicExpression = ExpressionSet((0 until 300).map(i => aUpper + i)) benchmark.addCase("Test ++", 10) { _: Int => for (_ <- 0L until count) { initialSet ++= setToAddWithSameDeterministicExpression } } benchmark.run() } } ``` before this change: ``` OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64 Intel Core Processor (Skylake, IBRS) Test ExpressionSetV2 ++ : Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Test ++ 1577 1691 61 0.0 5255516.0 1.0X ``` after this change: ``` OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-957.el7.x86_64 Intel Core Processor (Skylake, IBRS) Test ExpressionSetV2 ++ : Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Test ++ 14 14 0 0.0 45395.2 1.0X ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #46114 from wForget/SPARK-47897. Authored-by: Zhen Wang <643348...@qq.com> Signed-off-by: Kent Yao <y...@apache.org> --- .../sql/catalyst/expressions/ExpressionSet.scala | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala b/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala index 3e545f745bae..c18679330f3a 100644 --- a/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala +++ b/sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala @@ -17,7 +17,7 @@ package org.apache.spark.sql.catalyst.expressions -import scala.collection.mutable +import scala.collection.{mutable, GenTraversableOnce} import scala.collection.mutable.ArrayBuffer object ExpressionSet { @@ -108,12 +108,31 @@ class ExpressionSet protected( newSet } + /** + * SPARK-47897: In Scala 2.12, the `SetLike.++` method iteratively calls `+` method. + * `ExpressionSet.+` is expensive, so we override `++`. + */ + override def ++(elems: GenTraversableOnce[Expression]): ExpressionSet = { + val newSet = clone() + elems.foreach(newSet.add) + newSet + } + override def -(elem: Expression): ExpressionSet = { val newSet = clone() newSet.remove(elem) newSet } + /** + * SPARK-47897: We need to override `--` like `++`. + */ + override def --(elems: GenTraversableOnce[Expression]): ExpressionSet = { + val newSet = clone() + elems.foreach(newSet.remove) + newSet + } + def map(f: Expression => Expression): ExpressionSet = { val newSet = new ExpressionSet() this.iterator.foreach(elem => newSet.add(f(elem))) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org