[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224871#comment-14224871
 ] 

Debasish Das commented on SPARK-2426:
-------------------------------------

Actually on MovieLens dataset, I am getting good MAP numbers with EQUALITY 
constraint...The formulation is similar to PLSA but not exact:

[~akopich] could you please help review if my understanding is correct here ?

k \in {1...25} (if we running with rank as 25)

Minimize \sum_i \sum_j ( r_ij - w_i*h_j) + lambda(||w_i||^2 + ||h_j||^2) 
s.t \sum_k w_ik = 1, w_ik >= 0
\sum_k h_kj = 1, h_kj >= 0

This is not quite the stochastic matrix factorization that this paper 
http://www.machinelearning.ru/wiki/images/1/1f/Voron14aist.pdf talks about as 
PLSA needs the following constraint (I am reading it more) along with 
log-likelihood loss:

For each k \sum_j h_kj = 1

On MovieLens dataset I run the EQUALITY version as follows (rank=50, 5 
iterations). More iterations does not improve it further.

./bin/spark-submit --total-executor-cores 4 --executor-memory 4g  
--driver-memory 1g --master spark://TUSCA09LMLVT00C.local:7077 --jars 
~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar --class 
org.apache.spark.examples.mllib.MovieLensALS 
./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar --rank 50 
--numIterations 5 --userConstraint EQUALITY --lambdaUser 0.065 
--productConstraint EQUALITY --lambdaProduct 0.065 --kryo 
--validateRecommendation hdfs://localhost:8020/sandbox/movielens/

Got 1000209 ratings from 6040 users on 3706 movies.                             
                                                                                
                  
Training: 800670, test: 199539.
Quadratic minimization userConstraint EQUALITY productConstraint EQUALITY
Test RMSE = 1.6970509086529808.
Test users 6038 MAP 0.09333309533803603                                         
                                                                                
                  

So basically best MAP results come from this formulation. 2X improvement over 
default of 4.8%

[~mengxr] [~srowen] it will be great if you guys can review the MAP calculation 
https://issues.apache.org/jira/browse/SPARK-4231 and help merge it to mllib. I 
am keen to understand if there are bugs in the calculation. 

This is a bit surprising to me since I have not finished the PLSA code (I am 
working on the bi-concave cost) as the paper points out and that means results 
can improve further. Note the degradation in RMSE.

I will do runs with Netflix dataset but on our internal dataset (2M x 20K) 
trends look similar.

> Quadratic Minimization for MLlib ALS
> ------------------------------------
>
>                 Key: SPARK-2426
>                 URL: https://issues.apache.org/jira/browse/SPARK-2426
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Debasish Das
>            Assignee: Debasish Das
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Current ALS supports least squares and nonnegative least squares.
> I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
> the following ALS problems:
> 1. ALS with bounds
> 2. ALS with L1 regularization
> 3. ALS with Equality constraint and bounds
> Initial runtime comparisons are presented at Spark Summit. 
> http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
> Based on Xiangrui's feedback I am currently comparing the ADMM based 
> Quadratic Minimization solvers with IPM based QpSolvers and the default 
> ALS/NNLS. I will keep updating the runtime comparison results.
> For integration the detailed plan is as follows:
> 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
> 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to