[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r970791487 ## mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala: ## @@ -153,9 +153,10 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") (predictionAndLabels: RDD[_ <: def ndcgAt(k: Int): Double = { require(k > 0, "ranking position k should be positive") rdd.map { case (pred, lab, rel) => + import org.apache.spark.util.collection.Utils Review Comment: This is a mistake, I will fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971408078 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -62,4 +63,30 @@ private[spark] object Utils { */ def sequenceToOption[T](input: Seq[Option[T]]): Option[Seq[T]] = if (input.forall(_.isDefined)) Some(input.flatten) else None + + /** + * Same function as `keys.zip(values).toMap`, but has perf gain. + */ + def toMap[K, V](keys: Iterable[K], values: Iterable[V]): Map[K, V] = { +val builder = immutable.Map.newBuilder[K, V] +val keyIter = keys.iterator +val valueIter = values.iterator +while (keyIter.hasNext && valueIter.hasNext) { + builder += (keyIter.next(), valueIter.next()).asInstanceOf[(K, V)] +} +builder.result() + } + + /** + * Same function as `keys.zip(values).toMap.asJava`, but has perf gain. + */ + def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): java.util.Map[K, V] = { +val map = new java.util.HashMap[K, V]() +val keyIter = keys.iterator +val valueIter = values.iterator +while (keyIter.hasNext && valueIter.hasNext) { + map.put(keyIter.next(), valueIter.next()) +} +map Review Comment: this method return a Java Map, how to make it immutable... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971408657 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -62,4 +63,30 @@ private[spark] object Utils { */ def sequenceToOption[T](input: Seq[Option[T]]): Option[Seq[T]] = if (input.forall(_.isDefined)) Some(input.flatten) else None + + /** + * Same function as `keys.zip(values).toMap`, but has perf gain. + */ + def toMap[K, V](keys: Iterable[K], values: Iterable[V]): Map[K, V] = { +val builder = immutable.Map.newBuilder[K, V] +val keyIter = keys.iterator +val valueIter = values.iterator +while (keyIter.hasNext && valueIter.hasNext) { + builder += (keyIter.next(), valueIter.next()).asInstanceOf[(K, V)] +} +builder.result() + } + + /** + * Same function as `keys.zip(values).toMap.asJava`, but has perf gain. + */ + def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): java.util.Map[K, V] = { +val map = new java.util.HashMap[K, V]() +val keyIter = keys.iterator +val valueIter = values.iterator +while (keyIter.hasNext && valueIter.hasNext) { + map.put(keyIter.next(), valueIter.next()) +} +map Review Comment: Wrap to Collections.unmodifiableMap? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971411597 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -129,20 +131,19 @@ object ArrayBasedMapData { def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = { val keys = map.keyArray.asInstanceOf[GenericArrayData].array val values = map.valueArray.asInstanceOf[GenericArrayData].array -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: scala.collection.Seq[Any], values: scala.collection.Seq[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] = { -import scala.collection.JavaConverters._ -keys.zip(values).toMap.asJava +Utils.toJavaMap(keys, values) } Review Comment: `ArrayBasedMapData#toJavaMap` is already a never used method, I think we can delete it, but need to confirm whether MiMa check can pass first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971408078 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -62,4 +63,30 @@ private[spark] object Utils { */ def sequenceToOption[T](input: Seq[Option[T]]): Option[Seq[T]] = if (input.forall(_.isDefined)) Some(input.flatten) else None + + /** + * Same function as `keys.zip(values).toMap`, but has perf gain. + */ + def toMap[K, V](keys: Iterable[K], values: Iterable[V]): Map[K, V] = { +val builder = immutable.Map.newBuilder[K, V] +val keyIter = keys.iterator +val valueIter = values.iterator +while (keyIter.hasNext && valueIter.hasNext) { + builder += (keyIter.next(), valueIter.next()).asInstanceOf[(K, V)] +} +builder.result() + } + + /** + * Same function as `keys.zip(values).toMap.asJava`, but has perf gain. + */ + def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): java.util.Map[K, V] = { +val map = new java.util.HashMap[K, V]() +val keyIter = keys.iterator +val valueIter = values.iterator +while (keyIter.hasNext && valueIter.hasNext) { + map.put(keyIter.next(), valueIter.next()) +} +map Review Comment: ~~this method return a Java Map, how to make it immutable...~~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971423880 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -129,20 +131,19 @@ object ArrayBasedMapData { def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = { val keys = map.keyArray.asInstanceOf[GenericArrayData].array val values = map.valueArray.asInstanceOf[GenericArrayData].array -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: scala.collection.Seq[Any], values: scala.collection.Seq[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] = { -import scala.collection.JavaConverters._ -keys.zip(values).toMap.asJava +Utils.toJavaMap(keys, values) } Review Comment: let me check this later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971425920 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -62,4 +63,30 @@ private[spark] object Utils { */ def sequenceToOption[T](input: Seq[Option[T]]): Option[Seq[T]] = if (input.forall(_.isDefined)) Some(input.flatten) else None + + /** + * Same function as `keys.zip(values).toMap`, but has perf gain. + */ + def toMap[K, V](keys: Iterable[K], values: Iterable[V]): Map[K, V] = { +val builder = immutable.Map.newBuilder[K, V] +val keyIter = keys.iterator +val valueIter = values.iterator +while (keyIter.hasNext && valueIter.hasNext) { + builder += (keyIter.next(), valueIter.next()).asInstanceOf[(K, V)] +} +builder.result() + } + + /** + * Same function as `keys.zip(values).toMap.asJava`, but has perf gain. + */ + def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): java.util.Map[K, V] = { +val map = new java.util.HashMap[K, V]() +val keyIter = keys.iterator +val valueIter = values.iterator +while (keyIter.hasNext && valueIter.hasNext) { + map.put(keyIter.next(), valueIter.next()) +} +map Review Comment: u are right, change to return `Collections.unmodifiableMap(map)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971460711 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -129,20 +131,19 @@ object ArrayBasedMapData { def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = { val keys = map.keyArray.asInstanceOf[GenericArrayData].array val values = map.valueArray.asInstanceOf[GenericArrayData].array -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: scala.collection.Seq[Any], values: scala.collection.Seq[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] = { -import scala.collection.JavaConverters._ -keys.zip(values).toMap.asJava +Utils.toJavaMap(keys, values) } Review Comment: removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971411597 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -129,20 +131,19 @@ object ArrayBasedMapData { def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = { val keys = map.keyArray.asInstanceOf[GenericArrayData].array val values = map.valueArray.asInstanceOf[GenericArrayData].array -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: scala.collection.Seq[Any], values: scala.collection.Seq[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] = { -import scala.collection.JavaConverters._ -keys.zip(values).toMap.asJava +Utils.toJavaMap(keys, values) } Review Comment: ~`ArrayBasedMapData#toJavaMap` is already a never used method, I think we can delete it, but need to confirm whether MiMa check can pass first~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971705885 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -129,20 +131,19 @@ object ArrayBasedMapData { def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = { val keys = map.keyArray.asInstanceOf[GenericArrayData].array val values = map.valueArray.asInstanceOf[GenericArrayData].array -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: scala.collection.Seq[Any], values: scala.collection.Seq[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] = { -import scala.collection.JavaConverters._ -keys.zip(values).toMap.asJava +Utils.toJavaMap(keys, values) } Review Comment: @mridulm @cloud-fan if https://github.com/apache/spark/blob/6d067d059f3d2a62035d1b5f71ea5d87e1705643/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala#L338-L343 change to ``` StaticInvoke( Utils.getClass, ObjectType(classOf[JMap[_, _]]), "toJavaMap", keyData :: valueData :: Nil, returnNullable = false) ``` The signature to `toJavaMap` method in `collection.Utils` need change from ``` def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): java.util.Map[K, V] ``` to ``` def toJavaMap[K, V](keys: Array[K], values: Array[V]): java.util.Map[K, V] ``` Otherwise, relevant tests will fail as due to ``` 16:20:35.587 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 375, Column 50: No applicable constructor/method found for actual parameters "java.lang.Object[], java.lang.Object[]"; candidates are: "public static java.util.Map org.apache.spark.util.collection.Utils.toJavaMap(scala.collection.Iterable, scala.collection.Iterable)" org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 375, Column 50: No applicable constructor/method found for actual parameters "java.lang.Object[], java.lang.Object[]"; candidates are: "public static java.util.Map org.apache.spark.util.collection.Utils.toJavaMap(scala.collection.Iterable, scala.collection.Iterable)" ``` If the method signature is def toJavaMap[K, V](keys: Array[K], values: Array[V]): java.util.Map[K, V], it will limit the use scope of this method, so I prefer to retain the `ArrayBasedMapData#toJavaMap` method -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971705885 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -129,20 +131,19 @@ object ArrayBasedMapData { def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = { val keys = map.keyArray.asInstanceOf[GenericArrayData].array val values = map.valueArray.asInstanceOf[GenericArrayData].array -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: scala.collection.Seq[Any], values: scala.collection.Seq[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] = { -import scala.collection.JavaConverters._ -keys.zip(values).toMap.asJava +Utils.toJavaMap(keys, values) } Review Comment: @mridulm @cloud-fan if https://github.com/apache/spark/blob/6d067d059f3d2a62035d1b5f71ea5d87e1705643/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala#L338-L343 change to ``` StaticInvoke( Utils.getClass, ObjectType(classOf[JMap[_, _]]), "toJavaMap", keyData :: valueData :: Nil, returnNullable = false) ``` The signature to `toJavaMap` method in `collection.Utils` need change from ``` def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): java.util.Map[K, V] ``` to ``` def toJavaMap[K, V](keys: Array[K], values: Array[V]): java.util.Map[K, V] ``` Otherwise, relevant tests will fail as due to ``` 16:20:35.587 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 375, Column 50: No applicable constructor/method found for actual parameters "java.lang.Object[], java.lang.Object[]"; candidates are: "public static java.util.Map org.apache.spark.util.collection.Utils.toJavaMap(scala.collection.Iterable, scala.collection.Iterable)" org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 375, Column 50: No applicable constructor/method found for actual parameters "java.lang.Object[], java.lang.Object[]"; candidates are: "public static java.util.Map org.apache.spark.util.collection.Utils.toJavaMap(scala.collection.Iterable, scala.collection.Iterable)" ``` If the method signature is `def toJavaMap[K, V](keys: Array[K], values: Array[V]): java.util.Map[K, V]`, it will limit the use scope of this method, so I prefer to retain the `ArrayBasedMapData#toJavaMap` method -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971411597 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -129,20 +131,19 @@ object ArrayBasedMapData { def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = { val keys = map.keyArray.asInstanceOf[GenericArrayData].array val values = map.valueArray.asInstanceOf[GenericArrayData].array -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: scala.collection.Seq[Any], values: scala.collection.Seq[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] = { -import scala.collection.JavaConverters._ -keys.zip(values).toMap.asJava +Utils.toJavaMap(keys, values) } Review Comment: ~`ArrayBasedMapData#toJavaMap` is already a never used method, I think we can delete it, but need to confirm whether MiMa check can pass first~ `ArrayBasedMapData#toJavaMap` not unused mthod, it used by `JavaTypeInference`, sorry, for missing what @mridulm said ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -129,20 +131,19 @@ object ArrayBasedMapData { def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = { val keys = map.keyArray.asInstanceOf[GenericArrayData].array val values = map.valueArray.asInstanceOf[GenericArrayData].array -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: scala.collection.Seq[Any], values: scala.collection.Seq[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] = { -import scala.collection.JavaConverters._ -keys.zip(values).toMap.asJava +Utils.toJavaMap(keys, values) } Review Comment: ~`ArrayBasedMapData#toJavaMap` is already a never used method, I think we can delete it, but need to confirm whether MiMa check can pass first~ EDIT: `ArrayBasedMapData#toJavaMap` not unused mthod, it used by `JavaTypeInference`, sorry, for missing what @mridulm said -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971411597 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -129,20 +131,19 @@ object ArrayBasedMapData { def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = { val keys = map.keyArray.asInstanceOf[GenericArrayData].array val values = map.valueArray.asInstanceOf[GenericArrayData].array -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: scala.collection.Seq[Any], values: scala.collection.Seq[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] = { -import scala.collection.JavaConverters._ -keys.zip(values).toMap.asJava +Utils.toJavaMap(keys, values) } Review Comment: ~`ArrayBasedMapData#toJavaMap` is already a never used method, I think we can delete it, but need to confirm whether MiMa check can pass first~ EDIT: `ArrayBasedMapData#toJavaMap` not unused method, it used by `JavaTypeInference`, sorry, for missing what @mridulm said ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -129,20 +131,19 @@ object ArrayBasedMapData { def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = { val keys = map.keyArray.asInstanceOf[GenericArrayData].array val values = map.valueArray.asInstanceOf[GenericArrayData].array -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: scala.collection.Seq[Any], values: scala.collection.Seq[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] = { -import scala.collection.JavaConverters._ -keys.zip(values).toMap.asJava +Utils.toJavaMap(keys, values) } Review Comment: ~`ArrayBasedMapData#toJavaMap` is already a never used method, I think we can delete it, but need to confirm whether MiMa check can pass first~ EDIT: `ArrayBasedMapData#toJavaMap` not unused method, it used by `JavaTypeInference`, sorry for missing what @mridulm said -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][STREAMING] Optimize the performance of `keys.zip(values).toMap` code pattern
LuciferYang commented on code in PR #37876: URL: https://github.com/apache/spark/pull/37876#discussion_r971705885 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -129,20 +131,19 @@ object ArrayBasedMapData { def toScalaMap(map: ArrayBasedMapData): Map[Any, Any] = { val keys = map.keyArray.asInstanceOf[GenericArrayData].array val values = map.valueArray.asInstanceOf[GenericArrayData].array -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: Array[Any], values: Array[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toScalaMap(keys: scala.collection.Seq[Any], values: scala.collection.Seq[Any]): Map[Any, Any] = { -keys.zip(values).toMap +Utils.toMap(keys, values) } def toJavaMap(keys: Array[Any], values: Array[Any]): java.util.Map[Any, Any] = { -import scala.collection.JavaConverters._ -keys.zip(values).toMap.asJava +Utils.toJavaMap(keys, values) } Review Comment: @mridulm @cloud-fan If https://github.com/apache/spark/blob/6d067d059f3d2a62035d1b5f71ea5d87e1705643/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala#L338-L343 change to ```scala StaticInvoke( Utils.getClass, ObjectType(classOf[JMap[_, _]]), "toJavaMap", keyData :: valueData :: Nil, returnNullable = false) ``` The signature to `toJavaMap` method in `collection.Utils` need change from ``` def toJavaMap[K, V](keys: Iterable[K], values: Iterable[V]): java.util.Map[K, V] ``` to ``` def toJavaMap[K, V](keys: Array[K], values: Array[V]): java.util.Map[K, V] ``` Otherwise, relevant tests will fail as due to ``` 16:20:35.587 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 375, Column 50: No applicable constructor/method found for actual parameters "java.lang.Object[], java.lang.Object[]"; candidates are: "public static java.util.Map org.apache.spark.util.collection.Utils.toJavaMap(scala.collection.Iterable, scala.collection.Iterable)" org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 375, Column 50: No applicable constructor/method found for actual parameters "java.lang.Object[], java.lang.Object[]"; candidates are: "public static java.util.Map org.apache.spark.util.collection.Utils.toJavaMap(scala.collection.Iterable, scala.collection.Iterable)" ``` If the method signature is `def toJavaMap[K, V](keys: Array[K], values: Array[V]): java.util.Map[K, V]`, it will limit the use scope of this method, so I prefer to retain the `ArrayBasedMapData#toJavaMap` method -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org