Hi, I'm testing Collaborative Filtering with Milib. Making a model by ALS.trainImplicit (or train) seems scalable as far as I have tested, but I'm wondering how I can get all the recommendation results efficiently.
The predictAll method can get all the results, but it needs the whole user-product matrix in memory as an input. So if there are 1 million users and 1 million products, then the number of elements is too large (1 million x 1 million) and the amount of memory to hold them is more than a few TB even when the element size in only 4B, which is not a realistic size of memory even now. # (1000000*1000000)*4/1000/1000/1000/1000 => near equals 4TB) We can, of course, use predict method per user, but, as far as I tried, it is very slow to get 1 million users' results. Do I miss something ? Are there any other better ways to get all the recommendation results in scalable and efficient way ? Best regards, Hiro