Hi there, with regard to Robin mentioning JBlas [1] recently when we talked about the performance of our vector operations, I ported the solving code for ALS to JBlas today and got some awesome results.
For the movielens 1M dataset and a factorization of rank 100, the runtimes per iteration dropped from 50 seconds to less than 7 seconds. I will run some tests with the distributed version and larger datasets in the next days, but from what I've seen we should really take a closer look at JBlas, at least for operations on dense matrices. Best, Sebastian [1] http://mikiobraun.github.io/jblas/