[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2015-05-23 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557416#comment-14557416
 ] 

Debasish Das commented on SPARK-2426:
-

[~mengxr] Should I add the PR to spark packages and close the JIRA ? The main 
contribution was to add sparsity constraints (L1 and probability simplex) to 
user and product factors in implicit and explicit feedback factorization and 
interested users can use the features from spark packages if they need...Later 
if there is community interest, we can pull it in to master ALS ?

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.4.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2015-03-24 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377357#comment-14377357
 ] 

Debasish Das commented on SPARK-2426:
-

[~acopich] From your comment before Anyway, l2 regularized stochastic matrix 
decomposition problem is defined as follows
Minimize w.r.t. W and H : ||R - W*H|| + \lambda(||W|| + ||H||)
under non-negativeness and normalization constraints.
||.|| stands for Frobenius norm (or l1)., could you please point me to a good 
reference with application to collaborative filtering/topic modeling ? 
Stochastic matrix decomposition is what we can do in this PR now 
https://github.com/apache/spark/pull/3221

For MAP loss, I will open up a PR in a week through JIRA 
https://issues.apache.org/jira/browse/SPARK-6323...I am very curious how much 
slower we get compared to stochastic matrix decomposition using ALS

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2015-03-22 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375325#comment-14375325
 ] 

Debasish Das commented on SPARK-2426:
-

[~acopich] There's a completely different loss... BTW, we've used a 
factorisation with the loss you've described as an initial approximation for 
PLSA. It gave a significant speed-up. Could you help adding some testcases and 
driver for the PLSA approximation ? the PR 
https://github.com/apache/spark/pull/3221 has now the LSA constraints and least 
square loss...

Idea here is to do probability simplex on user side, bounds on the item side 
and normalization on item columns at each ALS iteration...The MAP loss is 
tracked through https://issues.apache.org/jira/browse/SPARK-6323 but the solve 
idea will be very similar as I mentioned before and so we can re-use the flow 
test-cases...We can discuss more on the PR...It will be great if you can help 
add examples.mllib.PLSA as well that will driver both PLSA through ALS and ALM 
(alternating MAP loss optimization)...

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2015-03-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359695#comment-14359695
 ] 

Apache Spark commented on SPARK-2426:
-

User 'debasish83' has created a pull request for this issue:
https://github.com/apache/spark/pull/5005

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2015-03-07 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351839#comment-14351839
 ] 

Debasish Das commented on SPARK-2426:
-

[~mengxr] NNLS and QuadraticMinimizer are both merged to BreezeI will 
migrate ml.recommendation.ALS accordingly...

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2015-02-03 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302932#comment-14302932
 ] 

Debasish Das commented on SPARK-2426:
-

[~mengxr] [~coderxiang] David is out in Feb and I am not sure if we can cut a 
breeze release with the code. I refactored NNLS to breeze.optimize.linear due 
to its similarity to CG core. Proximal algorithms and QuadraticMinimizer are 
refactored to breeze.optimize.proximal. It will be great if you could also 
review the PR https://github.com/scalanlp/breeze/pull/321. 

With this solver added to Breeze I am ready to add in ALS modifications to 
Spark. The test-cases for default ALS and nnls runs fine with my Spark PR. I 
need to add appropriate test-cases for sparse coding and least square loss with 
lsa constraints as explained above. 

Should I add them to ml.als or mllib.als since we have now two codebases ? My 
current PR will merge fine with mllib.als but not with ml.als. I see there is a 
CholeskySolver but all those features are supported in 
breeze.optimize.proximal.QuadraticMinimizer.

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-12-11 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243149#comment-14243149
 ] 

Debasish Das commented on SPARK-2426:
-

[~mengxr] as per our discussion, QuadraticMinimizer and NNLS are both added to 
breeze and updated with breeze DenseMatrix and DenseVector...Inside breeze I 
did some interesting comparisons and that motivated me to port NNLS to breeze 
as well...I added all the testcases for QuadraticMinimizer and NNLS as well 
based on my experiments with MovieLens dataset...

Here is the PR: https://github.com/scalanlp/breeze/pull/321

To run the Quadratic programming variants in breeze:
runMain breeze.optimize.quadratic.QuadraticMinimizer 100 1 0.1 0.99
regParam = 0.1, beta = 0.99 is Elastic Net parameter
 
It will randomly generate quadratic problems with 100 variables, 1 equality 
constraint and lower/upper bounds. This format is similar to PDCO QP generator 
(please look into my Matlab examples)

0.5x'Hx + c'x
s.t Ax = B, 
lb = x = ub

1. Unconstrained minimization: breeze luSolve, cg and qp(dposv added to breeze 
through this PR)

Minimize 0.5x'Hx + c'x

||qp - lu|| norm 4.312577233496585E-10 max-norm 1.3842793578078272E-10
||cg - lu|| norm 4.167925029822007E-7 max-norm 1.0053204402282745E-7
dim 100 lu 86.007 qp 41.56 cg 102.627

||qp - lu|| norm 4.267891623199082E-8 max-norm 6.681460718027665E-9
||cg - lu|| norm 1.94497623480055E-7 max-norm 2.6288773824489908E-8
dim 500 lu 169.993 qp 78.001 cg 443.044

qp is faster than cg for smaller dimensions as expected. I also tried 
unconstrained BFGS but the results were not good. We are looking into it.

2. Elastic Net formulation: 0.5 x'Hx + c'x + (1-beta)*L2(x) + 
beta*regParam*L1(x)

beta = 0.99 Strong L1 regParam=0.1
||owlqn - sparseqp|| norm 0.1653200701235298 inf-norm 0.051855911945906996
sparseQp 61.948 ms iters 227 owlqn 928.11 ms

beta = 0.5 average L1 regParam=0.1
||owlqn - sparseqp|| norm 0.15823773098501168 inf-norm 0.035153837685728107
sparseQp 69.934 ms iters 353 owlqn 882.104 ms

beta = 0.01 mostly BFGS regParam=0.1

||owlqn - sparseqp|| norm 0.17950035092790165 inf-norm 0.04718697692014828
sparseQp 80.411 ms iters 580 owlqn 988.313 ms

ADMM based proximal formulation is faster for smaller dimension. Even as I 
scale dimension, I notice similar behavior that owlqn is taking longer to 
converge and results are not same. Look for example in dim = 500 case:

||owlqn - sparseqp|| norm 10.946326189397649 inf-norm 1.412726586317294
sparseQp 830.593 ms iters 2417 owlqn 19848.932 ms

I validated ADMM through Matlab scripts so there is something funky going on in 
OWLQN.

3. NNLS formulation:  0.5 x'Hx + c'x s.t x = 0

Here are compared ADMM based proximal formulation with CG based projected 
gradient in NNLS. NNLS converges much nicer but the convergence criteria does 
not look same as breeze CG but they should be same. 

For now I ported it to breeze and we can call NNLS for x = 0 and 
QuadraticMinimizer for other formulations

dim = 100 posQp 16.367 ms iters 284 nnls 8.854 ms iters 107
dim = 500 posQp 303.184 ms iters 950 nnls 183.543 ms iters 517

NNLS on average looks 2X faster !

4. Bounds formulation: 0.5x'Hx + c'x s.t lb = x = ub
Validated through Matlab scripts above. Here are the runtime numbers:

dim = 100 boundsQp 15.654 ms iters 284 converged true
dim= 500 boundsQp 311.613 ms iters 950 converged true

5. Equality and positivity: 0.5 x'Hx + c'x s.t \sum_i x_i = 1, x_i =0
Validated through Matlab scripts above. Here are the runtime numbers:

dim = 100 Qp Equality 13.64 ms iters 184 converged true
dim = 500 Qp Equality 278.525 ms iters 890 converged true

With this change all copyrights are moved to breeze. Once it merges, I will 
update the Spark PR. With this change we can move ALS code to Breeze 
DenseMatrix and DenseVector as well 

My focus next will be to get a Truncated Newton running for convex cost since 
convex cost is required for PLSA, SVM and Neural Net formulations...

I am still puzzled that why BFGS/OWLQN is not working well for the 
unconstrained case/L1 optimization. If TRON works well for unconstrained case, 
that's what I will use for NonlinearMinimizer. I am looking more into it.

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime 

[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-12-11 Thread Valeriy Avanesov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243398#comment-14243398
 ] 

Valeriy Avanesov commented on SPARK-2426:
-

 what's the normalization constraint ? Each row of W should sum upto 1 and 
 each column of H should sum upto 1 with positivity ? 
Yes.

 That is similar to PLSA right except that PLSA will have a bi-concave loss...
There's a completely different loss... BTW, we've used a factorisation with the 
loss you've described as an initial approximation for PLSA. It gave a 
significant speed-up. 

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-12-11 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243456#comment-14243456
 ] 

Debasish Das commented on SPARK-2426:
-

[~akopich] I got good MAP results on recommendation datasets with the 
approximated PLSA formulation. I did not get time to compare that formulation 
with Gibbs sampling based LDA PR: 
https://issues.apache.org/jira/browse/SPARK-1405 yet. Did you compare them ?

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-12-02 Thread Valeriy Avanesov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231375#comment-14231375
 ] 

Valeriy Avanesov commented on SPARK-2426:
-

I'm not sure if I understand your question...

As far as I can see, w_i stands for a row of the matrix w and h_j stands for a 
column of the matrix h.  

\sum_i \sum_j ( r_ij - w_i*h_j) -- is not a matrix norm. Probably, you either 
miss abs or square -- \sum_i \sum_j |r_ij - w_i*h_j| or \sum_i \sum_j ( r_ij - 
w_i*h_j)^2
It looks like l2 regularized stochastic matrix decomposition with respect to 
Frobenius (or l1) norm. But I don't understand why do you consider k 
optimization problems (do you? What does k \in {1 ... 25} stand for?). 

Anyway, l2 regularized stochastic matrix decomposition problem is defined as 
follows 

Minimize w.r.t. W and H : ||R - W*H|| + \lambda(||W|| + ||H||)
under non-negativeness and normalization constraints. 

||..|| stands for Frobenius norm (or l1). 

By the way: is the matrix of ranks r stochastic? Stochastic matrix 
decomposition doesn't seem reasonable if it's not. 

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-11-19 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218891#comment-14218891
 ] 

Debasish Das commented on SPARK-2426:
-

With the MAP measures being added to examples.MovieLensALS through 
https://issues.apache.org/jira/browse/SPARK-4231 I compared the quality and 
runtime of the matrix completion formulations on MovieLens 1M dataset:

Default: userConstraint L2, productConstraint L2 lambdaUser=lambdaProduct=0.065 
rank=100 iterations 10

Test RMSE = 0.8436480113821955.
Test users 6038 MAP 0.05860164548002782

Solver: Cholesky decomposition followed by forward-backward solves

Per iteration runtime for baseline (solveTime in ms)

14/11/19 17:37:06 INFO ALS: usersOrProducts 924 slowConvergence 0 
QuadraticMinimizer solveTime 362.813 Iters 0
14/11/19 17:37:06 INFO ALS: usersOrProducts 910 slowConvergence 0 
QuadraticMinimizer solveTime 314.527 Iters 0
14/11/19 17:37:06 INFO ALS: usersOrProducts 927 slowConvergence 0 
QuadraticMinimizer solveTime 265.75 Iters 0
14/11/19 17:37:06 INFO ALS: usersOrProducts 918 slowConvergence 0 
QuadraticMinimizer solveTime 271.513 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1510 slowConvergence 0 
QuadraticMinimizer solveTime 370.177 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1512 slowConvergence 0 
QuadraticMinimizer solveTime 467.994 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1507 slowConvergence 0 
QuadraticMinimizer solveTime 511.894 Iters 0
14/11/19 17:37:09 INFO ALS: usersOrProducts 1511 slowConvergence 0 
QuadraticMinimizer solveTime 481.189 Iters 0

NMF: userConstraint POSITIVE, productConstraint POSITIVE, 
userLambda=productLambda=0.065 L2 regularization

Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800670, test: 199539.
Quadratic minimization userConstraint POSITIVE productConstraint POSITIVE
Test RMSE = 0.8435335132641906.
Test users 6038 MAP 0.056361816590625446

ALS iteration1 runtime:

QuadraticMinimizer convergence profile:

14/11/19 17:46:46 INFO ALS: usersOrProducts 918 slowConvergence 0 
QuadraticMinimizer solveTime 1936.281 Iters 73132
14/11/19 17:46:46 INFO ALS: usersOrProducts 927 slowConvergence 0 
QuadraticMinimizer solveTime 1871.364 Iters 75219
14/11/19 17:46:46 INFO ALS: usersOrProducts 910 slowConvergence 0 
QuadraticMinimizer solveTime 2067.735 Iters 73180
14/11/19 17:46:46 INFO ALS: usersOrProducts 924 slowConvergence 0 
QuadraticMinimizer solveTime 2127.161 Iters 75546
14/11/19 17:46:53 INFO ALS: usersOrProducts 1507 slowConvergence 0 
QuadraticMinimizer solveTime 3813.923 Iters 193207
14/11/19 17:46:54 INFO ALS: usersOrProducts 1511 slowConvergence 0 
QuadraticMinimizer solveTime 3894.068 Iters 196882
14/11/19 17:46:54 INFO ALS: usersOrProducts 1510 slowConvergence 0 
QuadraticMinimizer solveTime 3875.915 Iters 193987
14/11/19 17:46:54 INFO ALS: usersOrProducts 1512 slowConvergence 0 
QuadraticMinimizer solveTime 3939.765 Iters 192471

NNLS convergence profile:

14/11/19 17:46:46 INFO ALS: NNLS solveTime 252.909 iters 7381
14/11/19 17:46:46 INFO ALS: NNLS solveTime 256.803 iters 7740
14/11/19 17:46:46 INFO ALS: NNLS solveTime 274.352 iters 7491
14/11/19 17:46:46 INFO ALS: NNLS solveTime 272.971 iters 7664
14/11/19 17:46:53 INFO ALS: NNLS solveTime 1487.262 iters 60338
14/11/19 17:46:54 INFO ALS: NNLS solveTime 1472.742 iters 61321
14/11/19 17:46:54 INFO ALS: NNLS solveTime 1489.863 iters 62228
14/11/19 17:46:54 INFO ALS: NNLS solveTime 1494.192 iters 60489

ALS iteration 10

Quadratic Minimizer convergence profile:

14/11/19 17:48:17 INFO ALS: usersOrProducts 924 slowConvergence 0 
QuadraticMinimizer solveTime 1082.056 Iters 53724
14/11/19 17:48:17 INFO ALS: usersOrProducts 910 slowConvergence 0 
QuadraticMinimizer solveTime 1180.601 Iters 50593
14/11/19 17:48:17 INFO ALS: usersOrProducts 927 slowConvergence 0 
QuadraticMinimizer solveTime 1106.131 Iters 53069
14/11/19 17:48:17 INFO ALS: usersOrProducts 918 slowConvergence 0 
QuadraticMinimizer solveTime 1108.478 Iters 50895
14/11/19 17:48:23 INFO ALS: usersOrProducts 1510 slowConvergence 0 
QuadraticMinimizer solveTime 2262.193 Iters 116818
14/11/19 17:48:23 INFO ALS: usersOrProducts 1512 slowConvergence 0 
QuadraticMinimizer solveTime 2293.64 Iters 116026
14/11/19 17:48:23 INFO ALS: usersOrProducts 1507 slowConvergence 0 
QuadraticMinimizer solveTime 2241.491 Iters 116293
14/11/19 17:48:23 INFO ALS: usersOrProducts 1511 slowConvergence 0 
QuadraticMinimizer solveTime 2372.957 Iters 118391

NNLS convergence profile:

14/11/19 17:48:17 INFO ALS: NNLS solveTime 623.031 iters 21611
14/11/19 17:48:17 INFO ALS: NNLS solveTime 553.493 iters 21732
14/11/19 17:48:17 INFO ALS: NNLS solveTime 559.9 iters 22511
14/11/19 17:48:17 INFO ALS: NNLS solveTime 556.654 iters 21330
14/11/19 17:48:23 INFO ALS: NNLS solveTime 1672.582 iters 86006
14/11/19 17:48:23 INFO ALS: NNLS solveTime 1703.221 iters 85824
14/11/19 17:48:23 INFO ALS: NNLS 

[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-11-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207801#comment-14207801
 ] 

Apache Spark commented on SPARK-2426:
-

User 'debasish83' has created a pull request for this issue:
https://github.com/apache/spark/pull/3221

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-31 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191551#comment-14191551
 ] 

Debasish Das commented on SPARK-2426:
-

[~mengxr] The matlab comparison scripts are open sourced over here:

https://github.com/debasish83/ecos/blob/master/matlab/admm/qprandom.m
https://github.com/debasish83/ecos/blob/master/matlab/pdco4/code/pdcotestQP.m

The detailed comparisons are on the REAME.md. Please look at the section on 
Matlab comparisons.

In a nutshell, for bounds MOSEK and ADMM are similar, for elastic net Proximal 
is 10X faster compared to MOSEK, for equality MOSEK is 2-3X faster than 
Proximal but both PDCO and ECOS produces much worse result as compared to ADMM. 
Accelerated ADMM also did not work as good as default ADMM. Increasing the 
over-relaxation parameter helped accelerated ADMM but I have not explored it 
yet.

ADMM and PDCO are in Matlab but ECOS and MOSEK are both using mex files so they 
are expected to be more efficient.

Next I will add the performance results of running positivity, box, sparse 
coding / regularized lsi and robust-plsa on MovieLens dataset and validate 
product recommendation using the MAP measure...In terms of RMSE, default  
positive  sparse coding...

What's the largest datasets LDA PRs are running? I would like to try that on 
sparse coding as well...From these papers sparse coding/RLSI should give 
results at par with LDA:

https://www.cs.cmu.edu/~xichen/images/SLSA-sdm11-final.pdf
http://web.stanford.edu/group/mmds/slides2012/s-hli.pdf

The same randomized matrices can be generated and run in the PR as follows:

./bin/spark-class org.apache.spark.mllib.optimization.QuadraticMinimizer 1000 1 
1.0 0.99

rank=1000, equality=1.0 lambda=1.0 beta=0.99
L1regularization = lambda*beta L2regularization = lambda*(1-beta)

Generating randomized QPs with rank 1000 equalities 1
sparseQp 88.423 ms iterations 45 converged true
posQp 181.369 ms iterations 121 converged true
boundsQp 175.733 ms iterations 121 converged true
Qp Equality 2805.564 ms iterations 2230 converged true

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-31 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191997#comment-14191997
 ] 

Debasish Das commented on SPARK-2426:
-

Matlab comparisons of MOSEK, ECOS, PDCO and ADMM are over here:
https://github.com/debasish83/ecos/blob/master/README.md

MOSEK is available for research purposes. Let me know if there are issues in 
running the matlab scripts.

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-31 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192935#comment-14192935
 ] 

Debasish Das commented on SPARK-2426:
-

Refactored QuadraticMinimizer and NNLS from mllib optimization to 
breeze.optimize.quadratic
https://github.com/scalanlp/breeze/pull/321
I will update the PR as well but breeze latest depends on scala 2.11 but spark 
still uses 2.10
All license and copyright information also moved to breeze. So for spark no 
changes to license/notice files.

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-19 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176358#comment-14176358
 ] 

Debasish Das commented on SPARK-2426:
-

[~mengxr] I thought more on it and one of the reason we choose ADMM because 
QuadraticMinimizer is not designed to be a local algorithm

If it runs on Spark Master it will take a RDD...If it runs on Spark worker, it 
will take a H and c from x'Hx + c'x along with proximal operators...

I will update the API and show some POCs that how this meta algorithm will add 
LBFGS/Truncated Newton as a core solver for x-solve for scalable version of 
matrix factorization where we don't want to create the H matrix explicitly 
ever...

Truncated Newton is a better choice for the constraints we want to support...I 
am working on a variant of TRON and linear CG that's in breeze for the scalable 
version..Those are the building blocks I need...

I am sure some of the code will move to Breeze. Proximal will definitely move 
to Breeze but QuadraticMinimizer will be refactored. It will really help if you 
can open up a PR on the new ALS design you have and we can work on it...

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-17 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174824#comment-14174824
 ] 

Xiangrui Meng commented on SPARK-2426:
--

[~debasish83] Thanks for working on this feature! This is definitely lots of 
work. We need to figure out couple high-level questions before looking into the 
code:

1. License. There are two files that requires special license: proximal, which 
ports cvxgrp/proximal (BSD) and QPMinimizer:

{code}
... distributed with Copyright (c) 2014, Debasish Das (Verizon), all rights 
reserved.
{code}

Code contribution to Apache follows ICLA: 
http://www.apache.org/licenses/icla.txt . I'm not familiar with the terms. I saw

{code}
Except for the license granted herein to the Foundation and recipients of
software distributed by the Foundation, You reserve all right, title,
and interest in and to Your Contributions.
{code}

My understand is that if you want your code distributed with Apache License, we 
don't need special notice about your rights. Please check with Verizon's legal 
team to make sure they are okay with it. It would be really helpful If someone 
can explain in more details.

2. Interface. I'm doing a refactoring of ALS (SPARK-3541). I hope we can 
decouple the solvers (LS, QP) from ALS. In

https://github.com/mengxr/spark-als/blob/master/src/main/scala/org/apache/spark/ml/SimpleALS.scala

The subproblem is wrapped in a NormalEquation, which stores AtA, Atb, and n. A 
Cholesky solver takes a NormalEquation instance, solves it, and returns the 
solution. We can plug-in other solvers as long as NormalEquation provides all 
information we need. Does it apply to your use cases?

For public APIs, we should restrict parameters to simple types. For example, 
constraint = none | nonnegative | box. This is good for adding Python 
APIs. Those options should be sufficient for normal use cases. We can provide a 
developer API that allows advanced users to plug-in their own solvers. You can 
check the current proposal of parameters at SPARK-3530.

3. Where to put the implementation? Including MLlib's NNLS, those solvers are 
for local problems. What sounds ideal to me is breeze.optimize, which already 
contains several optimization solvers and we use LBFGS implemented there and 
maybe OWLQN soon.

4. This PR definitely needs some time to testing. The feature freeze deadline 
for v1.2 is Oct 31. I cannot promise time for code review given my current 
bandwidth. It would be great if you can share your MATLAB code (hopefully 
Octave compatible) and some performance results. So more developers can help 
test.

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-17 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175167#comment-14175167
 ] 

Debasish Das commented on SPARK-2426:
-

1. [~mengxr] Our legal was clear that Stanford and Verizon copyright should 
show up on the COPYRIGHT.txt file...I saw other company's copyrights and I did 
not think it will be a big issue...

2. For the new interface, we have two more requirements: Convex loss function 
(supporting huber loss / hinge loss etc) and no explicit AtA construction since 
once we start scaling to 1 factors for LSA then AtA construction will start 
to choke...Can I work on your branch ? 
https://github.com/mengxr/spark-als/blob/master/src/main/scala/org/apache/spark/ml/SimpleALS.scala

3. I agree to refactor the core solver including NNLS to breeze. That was the 
initial plan but since we wanted to test out the features in our internal 
datasets, integrating with mllib was faster. I am testing NNLS's CG 
implementation since as soon as explicit AtA construction is taken out, we need 
to rely on CG in-place of direct solvers...But I will refactor the solver out 
to breeze and that will take the copyright msgs to breeze as well.

4. Let me add the Matlab scripts and point to the repository. ECOS and MOSEK 
will need Matlab to run. PDCO and Proximal variants will run fine on Octave. I 
am not sure if MOSEK is supported on Octave.

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-17 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175377#comment-14175377
 ] 

Sean Owen commented on SPARK-2426:
--

Regarding licensing, if the code is BSD licensed then it does not require an 
entry in NOTICE file (it's a Category A license), and entries shouldn't be 
added to NOTICE unless required. I believe that in this case we will need to 
reproduce the text of the license in LICENSE since it will not be included 
otherwise from a Maven artifact. So I suggest: don't change NOTICE, and move 
the license in LICENSE up to the section where other licenses are reproduced in 
full. It's a complex issue but this is my best understanding of the right thing 
to do.


 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-10-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163027#comment-14163027
 ] 

Apache Spark commented on SPARK-2426:
-

User 'debasish83' has created a pull request for this issue:
https://github.com/apache/spark/pull/2705

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-08-13 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095232#comment-14095232
 ] 

Debasish Das commented on SPARK-2426:
-

Hi Xiangrui,

The branch is ready for an initial review. I will do lot of clean-up this week.

https://github.com/debasish83/spark/commits/qp-als

optimization/QuadraticMinimizer.scala is the placeholder for all 
QuadraticMinimization. 

Right now we support 5 features:

1. Least square
2. Least square with positivity
3. Least square with bounds : generalization of positivity
4. Least square with equality and positivity/bounds for LDA/PLSA
5. Least square + L1 constraint for sparse NMF

There are lot many regularization in Proximal.scala which can be re-used in 
mllib updater...L1Updater is an example of Proximal algorithm.

I feel we should move NNLS into QuadraticMinimizer as well and clean ALS.scala 
as you have suggested before...

QuadraticMinimizer is optimized for direct solve right now (cholesky / lu based 
on problem we are solving)

The CG core from NNLS should be used for iterative solve when ranks are 
high...I need a different variant of CG for Formulation 4 so NNLS CG is not 
sufficient for all the formulations.

Right now I am experimenting with ADMM rho and lambda values so that the NNLS 
iterations are at par with Least square with positivity. 

I will publish results from the comparisons.

I will also publish comparisons with PDCO, ECOS (IPM) and MOSEK with ADMM 
variants used in this branch...

For recommendation use-case, I expect to produce Jellylish L1 ball projection 
results on netflix/movielens dataset using Formulation 5.

Thanks.
Deb

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add ADMM and IPM based QuadraticMinimization solvers to 
 breeze.optimize.quadratic package.
 2. Add a QpSolver object in spark mllib optimization which calls breeze
 3. Add the QpSolver object in spark mllib ALS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org