[ 
https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221379#comment-14221379
 ] 

Debasish Das commented on SPARK-3066:
-------------------------------------

I did experiments on MovieLens dataset with varying rank on my localhost spark 
with 4 GB RAM and 4 cores to see how much MAP improvement we see as the rank is 
scaled

===
rank=10 (default)

Got 1000209 ratings from 6040 users on 3706 movies.                             
                                                                                
                
Training: 799747, test: 200462.
Test RMSE = 0.8528377625407709.                                                 
                                                                                
                
Test users 6036 MAP 0.03851426277536059

Runtime: 30s

===
rank=25

Got 1000209 ratings from 6040 users on 3706 movies.                             
                                                                                
                
Training: 800417, test: 199792.
Test RMSE = 0.8518001349769724.                                                 
                                                                                
                
Test users 6037 MAP 0.04508057348514959

Runtime: 30 s

===
rank=50

Got 1000209 ratings from 6040 users on 3706 movies.                             
                                                                                
                
Training: 800823, test: 199386.
Test RMSE = 0.8487416471685229.                                                 
                                                                                
                
Test users 6038 MAP 0.05145126538369158

Runtime 42s

===
rank=100

Got 1000209 ratings from 6040 users on 3706 movies.                             
                                                                                
                
Training: 800720, test: 199489.
Test RMSE = 0.8508095863317275.                                                 
                                                                                
                
Test users 6033 MAP 0.0561225429735388

Runtime 1.5m

===
rank=150

Got 1000209 ratings from 6040 users on 3706 movies.                             
                                                                                
                
Training: 800257, test: 199952.
Test RMSE = 0.8435902056186158.                                                 
                                                                                
                
Test users 6035 MAP 0.05855252471878818

Runtime 3.6 m

===
rank=200

Got 1000209 ratings from 6040 users on 3706 movies.                             
                                                                                
                
Training: 800356, test: 199853.
Test RMSE = 0.8452385688272382.                                                 
                                                                                
                
Test users 6037 MAP 0.059176892052172934

Runtime 7.4 mins

I will run through MovieLens10m and Netflix dataset and generate the numbers of 
them with varying ranks as well. I need to run them on cluster.

> Support recommendAll in matrix factorization model
> --------------------------------------------------
>
>                 Key: SPARK-3066
>                 URL: https://issues.apache.org/jira/browse/SPARK-3066
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Xiangrui Meng
>
> ALS returns a matrix factorization model, which we can use to predict ratings 
> for individual queries as well as small batches. In practice, users may want 
> to compute top-k recommendations offline for all users. It is very expensive 
> but a common problem. We can do some optimization like
> 1) collect one side (either user or product) and broadcast it as a matrix
> 2) use level-3 BLAS to compute inner products
> 3) use Utils.takeOrdered to find top-k



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to