David Vogelbacher created SPARK-39746: -----------------------------------------
Summary: Binary array operations can be faster if one side is a constant Key: SPARK-39746 URL: https://issues.apache.org/jira/browse/SPARK-39746 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: David Vogelbacher Array operations such as [ArraysOverlap|https://github.com/apache/spark/blob/79f133b7bbc1d9aa6a20dd8a34ec120902f96155/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L1367] are optimized to put all the elements of the smaller array into a HashSet, if elements properly support equals. However, if one of the arrays is a constant, we could do much better as we don't have to reconstruct the HashSet for each row, we could construct it just once and send it to all the executors. This would improve runtime by a constant factor. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org