Nick Pentreath created SPARK-20587: -------------------------------------- Summary: Improve performance of ML ALS recommendForAll Key: SPARK-20587 URL: https://issues.apache.org/jira/browse/SPARK-20587 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 2.2.0 Reporter: Nick Pentreath Assignee: Nick Pentreath
SPARK-11968 relates to excessive GC pressure from using the "blocked BLAS 3" approach for generating top-k recommendations in {{mllib.recommendation.MatrixFactorizationModel}}. The solution there is still based on blocking factors, but efficiently computes the top-k elements *per block* first (using {{BoundedPriorityQueue}}) and then computes the global top-k elements. This improves performance and GC pressure substantially for {{mllib}}'s ALS model. The same approach is also a lot more efficient than the current "crossJoin and score per-row" used in {{ml}}'s {{DataFrame}}-based method. This adapts the solution in SPARK-11968 for {{DataFrame}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org