Repository: spark
Updated Branches:
  refs/heads/master 9ab725eab -> 82253617f


[SPARK-18705][ML][DOC] Update user guide to reflect one pass solver for L1 and 
elastic-net

## What changes were proposed in this pull request?

WeightedLeastSquares now supports L1 and elastic net penalties and has an 
additional solver option: QuasiNewton. The docs are updated to reflect this 
change.

## How was this patch tested?

Docs only. Generated documentation to make sure Latex looks ok.

Author: sethah <seth.hendrickso...@gmail.com>

Closes #16139 from sethah/SPARK-18705.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/82253617
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/82253617
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/82253617

Branch: refs/heads/master
Commit: 82253617f5b3cdbd418c48f94e748651ee80077e
Parents: 9ab725e
Author: sethah <seth.hendrickso...@gmail.com>
Authored: Wed Dec 7 19:41:32 2016 -0800
Committer: Yanbo Liang <yblia...@gmail.com>
Committed: Wed Dec 7 19:41:32 2016 -0800

----------------------------------------------------------------------
 docs/ml-advanced.md | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/82253617/docs/ml-advanced.md
----------------------------------------------------------------------
diff --git a/docs/ml-advanced.md b/docs/ml-advanced.md
index 12a03d3..2747f2d 100644
--- a/docs/ml-advanced.md
+++ b/docs/ml-advanced.md
@@ -59,17 +59,25 @@ Given $n$ weighted observations $(w_i, a_i, b_i)$:
 
 The number of features for each observation is $m$. We use the following 
weighted least squares formulation:
 `\[   
-minimize_{x}\frac{1}{2} \sum_{i=1}^n \frac{w_i(a_i^T x -b_i)^2}{\sum_{k=1}^n 
w_k} + \frac{1}{2}\frac{\lambda}{\delta}\sum_{j=1}^m(\sigma_{j} x_{j})^2
+\min_{\mathbf{x}}\frac{1}{2} \sum_{i=1}^n \frac{w_i(\mathbf{a}_i^T \mathbf{x} 
-b_i)^2}{\sum_{k=1}^n w_k} + \frac{\lambda}{\delta}\left[\frac{1}{2}(1 - 
\alpha)\sum_{j=1}^m(\sigma_j x_j)^2 + \alpha\sum_{j=1}^m |\sigma_j x_j|\right]
 \]`
-where $\lambda$ is the regularization parameter, $\delta$ is the population 
standard deviation of the label
+where $\lambda$ is the regularization parameter, $\alpha$ is the elastic-net 
mixing parameter, $\delta$ is the population standard deviation of the label
 and $\sigma_j$ is the population standard deviation of the j-th feature column.
 
-This objective function has an analytic solution and it requires only one pass 
over the data to collect necessary statistics to solve.
-Unlike the original dataset which can only be stored in a distributed system,
-these statistics can be loaded into memory on a single machine if the number 
of features is relatively small, and then we can solve the objective function 
through Cholesky factorization on the driver.
+This objective function requires only one pass over the data to collect the 
statistics necessary to solve it. For an
+$n \times m$ data matrix, these statistics require only $O(m^2)$ storage and 
so can be stored on a single machine when $m$ (the number of features) is
+relatively small. We can then solve the normal equations on a single machine 
using local methods like direct Cholesky factorization or iterative 
optimization programs.
 
-WeightedLeastSquares only supports L2 regularization and provides options to 
enable or disable regularization and standardization.
-In order to make the normal equation approach efficient, WeightedLeastSquares 
requires that the number of features be no more than 4096. For larger problems, 
use L-BFGS instead.
+Spark MLlib currently supports two types of solvers for the normal equations: 
Cholesky factorization and Quasi-Newton methods (L-BFGS/OWL-QN). Cholesky 
factorization
+depends on a positive definite covariance matrix (i.e. columns of the data 
matrix must be linearly independent) and will fail if this condition is 
violated. Quasi-Newton methods
+are still capable of providing a reasonable solution even when the covariance 
matrix is not positive definite, so the normal equation solver can also fall 
back to 
+Quasi-Newton methods in this case. This fallback is currently always enabled 
for the `LinearRegression` and `GeneralizedLinearRegression` estimators.
+
+`WeightedLeastSquares` supports L1, L2, and elastic-net regularization and 
provides options to enable or disable regularization and standardization. In 
the case where no 
+L1 regularization is applied (i.e. $\alpha = 0$), there exists an analytical 
solution and either Cholesky or Quasi-Newton solver may be used. When $\alpha > 
0$ no analytical 
+solution exists and we instead use the Quasi-Newton solver to find the 
coefficients iteratively. 
+
+In order to make the normal equation approach efficient, 
`WeightedLeastSquares` requires that the number of features be no more than 
4096. For larger problems, use L-BFGS instead.
 
 ## Iteratively reweighted least squares (IRLS)
 
@@ -83,6 +91,6 @@ It solves certain optimization problems iteratively through 
the following proced
 * solve a weighted least squares (WLS) problem by WeightedLeastSquares.
 * repeat above steps until convergence.
 
-Since it involves solving a weighted least squares (WLS) problem by 
WeightedLeastSquares in each iteration,
+Since it involves solving a weighted least squares (WLS) problem by 
`WeightedLeastSquares` in each iteration,
 it also requires the number of features to be no more than 4096.
 Currently IRLS is used as the default solver of 
[GeneralizedLinearRegression](api/scala/index.html#org.apache.spark.ml.regression.GeneralizedLinearRegression).


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to