This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
from f6c4e58b85d [SPARK-40407][SQL] Fix the potential data skew caused by df.repartition add 08678456d16 [SPARK-40476][ML][SQL] Reduce the shuffle size of ALS No new revisions were added by this update. Summary of changes: .../org/apache/spark/ml/recommendation/ALS.scala | 18 ++-- .../ml/recommendation/TopByKeyAggregator.scala | 59 ----------- .../spark/ml/recommendation/CollectTopKSuite.scala | 111 +++++++++++++++++++++ .../recommendation/TopByKeyAggregatorSuite.scala | 73 -------------- .../catalyst/expressions/aggregate/collect.scala | 46 ++++++++- .../scala/org/apache/spark/sql/functions.scala | 3 + 6 files changed, 169 insertions(+), 141 deletions(-) delete mode 100644 mllib/src/main/scala/org/apache/spark/ml/recommendation/TopByKeyAggregator.scala create mode 100644 mllib/src/test/scala/org/apache/spark/ml/recommendation/CollectTopKSuite.scala delete mode 100644 mllib/src/test/scala/org/apache/spark/ml/recommendation/TopByKeyAggregatorSuite.scala --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org