[ https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191551#comment-14191551 ]
Debasish Das edited comment on SPARK-2426 at 10/31/14 8:04 AM: --------------------------------------------------------------- [~mengxr] The matlab comparison scripts are open sourced over here: https://github.com/debasish83/ecos/blob/master/matlab/admm/qprandom.m https://github.com/debasish83/ecos/blob/master/matlab/pdco4/code/pdcotestQP.m The detailed comparisons are on the REAME.md. Please look at the section on Matlab comparisons. In a nutshell, for bounds MOSEK and ADMM are similar, for elastic net Proximal is 10X faster compared to MOSEK, for equality MOSEK is 2-3X faster than Proximal but both PDCO and ECOS produces much worse result as compared to ADMM. Accelerated ADMM also did not work as good as default ADMM. Increasing the over-relaxation parameter helps ADMM but I have not explored it yet. ADMM and PDCO are in Matlab but ECOS and MOSEK are both using mex files so they are expected to be more efficient. Next I will add the performance results of running positivity, box, sparse coding / regularized lsi and robust-plsa on MovieLens dataset and validate product recommendation using the MAP measure...In terms of RMSE, default < positive < sparse coding... What's the largest datasets LDA PRs are running? I would like to try that on sparse coding as well...From these papers sparse coding/RLSI should give results at par with LDA: https://www.cs.cmu.edu/~xichen/images/SLSA-sdm11-final.pdf http://web.stanford.edu/group/mmds/slides2012/s-hli.pdf The same randomized matrices can be generated and run in the PR as follows: ./bin/spark-class org.apache.spark.mllib.optimization.QuadraticMinimizer 1000 1 1.0 0.99 rank=1000, equality=1.0 lambda=1.0 beta=0.99 L1regularization = lambda*beta L2regularization = lambda*(1-beta) Generating randomized QPs with rank 1000 equalities 1 sparseQp 88.423 ms iterations 45 converged true posQp 181.369 ms iterations 121 converged true boundsQp 175.733 ms iterations 121 converged true Qp Equality 2805.564 ms iterations 2230 converged true was (Author: debasish83): [~mengxr] The matlab comparison scripts are open sourced over here: https://github.com/debasish83/ecos/blob/master/matlab/admm/qprandom.m https://github.com/debasish83/ecos/blob/master/matlab/pdco4/code/pdcotestQP.m The detailed comparisons are on the REAME.md. Please look at the section on Matlab comparisons. In a nutshell, for bounds MOSEK and ADMM are similar, for elastic net Proximal is 10X faster compared to MOSEK, for equality MOSEK is 2-3X faster than Proximal but both PDCO and ECOS produces much worse result as compared to ADMM. Accelerated ADMM also did not work as good as default ADMM. Increasing the over-relaxation parameter helped accelerated ADMM but I have not explored it yet. ADMM and PDCO are in Matlab but ECOS and MOSEK are both using mex files so they are expected to be more efficient. Next I will add the performance results of running positivity, box, sparse coding / regularized lsi and robust-plsa on MovieLens dataset and validate product recommendation using the MAP measure...In terms of RMSE, default < positive < sparse coding... What's the largest datasets LDA PRs are running? I would like to try that on sparse coding as well...From these papers sparse coding/RLSI should give results at par with LDA: https://www.cs.cmu.edu/~xichen/images/SLSA-sdm11-final.pdf http://web.stanford.edu/group/mmds/slides2012/s-hli.pdf The same randomized matrices can be generated and run in the PR as follows: ./bin/spark-class org.apache.spark.mllib.optimization.QuadraticMinimizer 1000 1 1.0 0.99 rank=1000, equality=1.0 lambda=1.0 beta=0.99 L1regularization = lambda*beta L2regularization = lambda*(1-beta) Generating randomized QPs with rank 1000 equalities 1 sparseQp 88.423 ms iterations 45 converged true posQp 181.369 ms iterations 121 converged true boundsQp 175.733 ms iterations 121 converged true Qp Equality 2805.564 ms iterations 2230 converged true > Quadratic Minimization for MLlib ALS > ------------------------------------ > > Key: SPARK-2426 > URL: https://issues.apache.org/jira/browse/SPARK-2426 > Project: Spark > Issue Type: New Feature > Components: MLlib > Affects Versions: 1.0.0 > Reporter: Debasish Das > Assignee: Debasish Das > Original Estimate: 504h > Remaining Estimate: 504h > > Current ALS supports least squares and nonnegative least squares. > I presented ADMM and IPM based Quadratic Minimization solvers to be used for > the following ALS problems: > 1. ALS with bounds > 2. ALS with L1 regularization > 3. ALS with Equality constraint and bounds > Initial runtime comparisons are presented at Spark Summit. > http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark > Based on Xiangrui's feedback I am currently comparing the ADMM based > Quadratic Minimization solvers with IPM based QpSolvers and the default > ALS/NNLS. I will keep updating the runtime comparison results. > For integration the detailed plan is as follows: > 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization > 2. Integrate QuadraticMinimizer in mllib ALS -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org