On 5/26/15 5:45 PM, Ankur Dave wrote:
This is the latest GraphX-based ALS implementation that I'm aware of:
https://github.com/ankurdave/spark/blob/GraphXALS/graphx/src/main/scala/org/apache/spark/graphx/lib/ALS.scala
When I benchmarked it last year, it was about twice as slow as MLlib's
ALS, and I think the latter has gotten faster since then. The
performance gap is because the MLlib version implements some
ALS-specific optimizations that are hard to do using GraphX, such as
storing the edges twice (partitioned by source and by destination) to
reduce communication.
Ankur <http://www.ankurdave.com/>
Great, thanks for the link and explanation!