[45/51] [partial] incubator-madlib-site git commit: Update doc for 1.9.1 release

xtang Tue, 20 Sep 2016 11:40:02 -0700

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__linreg.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__linreg.html 
b/docs/latest/group__grp__linreg.html
index 6b205a3..5ca4bc8 100644
--- a/docs/latest/group__grp__linreg.html
+++ b/docs/latest/group__grp__linreg.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -123,7 +123,7 @@ 
$(document).ready(function(){initNavTree('group__grp__linreg.html','');});
 <li class="level1">
 <a href="#related">Related Topics</a> </li>
 </ul>
-</div><p>Linear regression models a linear relationship of a scalar dependent 
variable <img class="formulaInl" alt="$ y $" src="form_323.png"/> to one or 
more explanatory independent variables <img class="formulaInl" alt="$ x $" 
src="form_178.png"/> to build a model of coefficients.</p>
+</div><p>Linear regression models a linear relationship of a scalar dependent 
variable <img class="formulaInl" alt="$ y $" src="form_324.png"/> to one or 
more explanatory independent variables <img class="formulaInl" alt="$ x $" 
src="form_178.png"/> to build a model of coefficients.</p>
 <p><a class="anchor" id="train"></a></p><dl class="section user"><dt>Training 
Function</dt><dd></dd></dl>
 <p>The linear regression training function has the following syntax. </p><pre 
class="syntax">
 linregr_train( source_table,
@@ -154,7 +154,7 @@ linregr_train( source_table,
 <tr>
 <th>p_values </th><td>FLOAT8[]. Vector of the p-values of the coefficients.  
</td></tr>
 <tr>
-<th>condition_no </th><td>FLOAT8 array. The condition number of the <img 
class="formulaInl" alt="$X^{*}X$" src="form_324.png"/> matrix. A high condition 
number is usually an indication that there may be some numeric instability in 
the result yielding a less reliable model. A high condition number often 
results when there is a significant amount of colinearity in the underlying 
design matrix, in which case other regression techniques, such as elastic net 
regression, may be more appropriate.  </td></tr>
+<th>condition_no </th><td>FLOAT8 array. The condition number of the <img 
class="formulaInl" alt="$X^{*}X$" src="form_325.png"/> matrix. A high condition 
number is usually an indication that there may be some numeric instability in 
the result yielding a less reliable model. A high condition number often 
results when there is a significant amount of colinearity in the underlying 
design matrix, in which case other regression techniques, such as elastic net 
regression, may be more appropriate.  </td></tr>
 <tr>
 <th>bp_stats </th><td>FLOAT8. The Breush-Pagan statistic of heteroskedacity. 
Present only if the heteroskedacity argument was set to True when the model was 
trained.  </td></tr>
 <tr>
@@ -240,14 +240,14 @@ COPY houses FROM STDIN WITH DELIMITER '|';
  15 |  650 |       3 |  1.5 |  65000 | 1450 | 12000
 \.
 </pre></li>
-<li>Train a regression model. First, a single regression for all the data. 
<pre class="example">
+<li>Train a regression model. First, we generate a single regression for all 
data. <pre class="example">
 SELECT madlib.linregr_train( 'houses',
                              'houses_linregr',
                              'price',
                              'ARRAY[1, tax, bath, size]'
                            );
-</pre></li>
-<li>Generate three output models, one for each value of "bedroom". <pre 
class="example">
+</pre> (Note that in this example we are dynamically creating the array of 
independent variables from column names. If you have large numbers of 
independent variables beyond the PostgreSQL limit of maximum columns per table, 
you would pre-build the arrays and store them in a single column.)</li>
+<li>Next we generate three output models, one for each value of "bedroom". 
<pre class="example">
 SELECT madlib.linregr_train( 'houses',
                              'houses_linregr_bedroom',
                              'price',
@@ -320,43 +320,43 @@ FROM houses, houses_linregr m;
 <p><a class="anchor" id="notes"></a></p><dl class="section 
user"><dt>Note</dt><dd>All table names can be optionally schema qualified 
(current_schemas() would be searched if a schema name is not provided) and all 
table and column names should follow case-sensitivity and quoting rules per the 
database. (For instance, 'mytable' and 'MyTable' both resolve to the same 
entity, i.e. 'mytable'. If mixed-case or multi-byte characters are desired for 
entity names then the string should be double-quoted; in this case the input 
would be '"MyTable"').</dd></dl>
 <p><a class="anchor" id="background"></a></p><dl class="section 
user"><dt>Technical Background</dt><dd></dd></dl>
 <p>Ordinary least-squares (OLS) linear regression refers to a stochastic model 
in which the conditional mean of the dependent variable (usually denoted <img 
class="formulaInl" alt="$ Y $" src="form_3.png"/>) is an affine function of the 
vector of independent variables (usually denoted <img class="formulaInl" alt="$ 
\boldsymbol x $" src="form_58.png"/>). That is, </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ E[Y \mid \boldsymbol x] = \boldsymbol c^T 
\boldsymbol x \]" src="form_325.png"/>
+<img class="formulaDsp" alt="\[ E[Y \mid \boldsymbol x] = \boldsymbol c^T 
\boldsymbol x \]" src="form_326.png"/>
 </p>
 <p> for some unknown vector of coefficients <img class="formulaInl" alt="$ 
\boldsymbol c $" src="form_78.png"/>. The assumption is that the residuals are 
i.i.d. distributed Gaussians. That is, the (conditional) probability density of 
<img class="formulaInl" alt="$ Y $" src="form_3.png"/> is given by </p><p 
class="formulaDsp">
-<img class="formulaDsp" alt="\[ f(y \mid \boldsymbol x) = \frac{1}{\sqrt{2 \pi 
\sigma^2}} \cdot \exp\left(-\frac{1}{2 \sigma^2} \cdot (y - \boldsymbol x^T 
\boldsymbol c)^2 \right) \,. \]" src="form_326.png"/>
+<img class="formulaDsp" alt="\[ f(y \mid \boldsymbol x) = \frac{1}{\sqrt{2 \pi 
\sigma^2}} \cdot \exp\left(-\frac{1}{2 \sigma^2} \cdot (y - \boldsymbol x^T 
\boldsymbol c)^2 \right) \,. \]" src="form_327.png"/>
 </p>
 <p> OLS linear regression finds the vector of coefficients <img 
class="formulaInl" alt="$ \boldsymbol c $" src="form_78.png"/> that maximizes 
the likelihood of the observations.</p>
 <p>Let</p><ul>
-<li><img class="formulaInl" alt="$ \boldsymbol y \in \mathbf R^n $" 
src="form_327.png"/> denote the vector of observed dependent variables, with 
<img class="formulaInl" alt="$ n $" src="form_10.png"/> rows, containing the 
observed values of the dependent variable,</li>
+<li><img class="formulaInl" alt="$ \boldsymbol y \in \mathbf R^n $" 
src="form_328.png"/> denote the vector of observed dependent variables, with 
<img class="formulaInl" alt="$ n $" src="form_10.png"/> rows, containing the 
observed values of the dependent variable,</li>
 <li><img class="formulaInl" alt="$ X \in \mathbf R^{n \times k} $" 
src="form_98.png"/> denote the design matrix with <img class="formulaInl" 
alt="$ k $" src="form_97.png"/> columns and <img class="formulaInl" alt="$ n $" 
src="form_10.png"/> rows, containing all observed vectors of independent 
variables. <img class="formulaInl" alt="$ \boldsymbol x_i $" 
src="form_99.png"/> as rows,</li>
-<li><img class="formulaInl" alt="$ X^T $" src="form_328.png"/> denote the 
transpose of <img class="formulaInl" alt="$ X $" src="form_2.png"/>,</li>
-<li><img class="formulaInl" alt="$ X^+ $" src="form_329.png"/> denote the 
pseudo-inverse of <img class="formulaInl" alt="$ X $" src="form_2.png"/>.</li>
+<li><img class="formulaInl" alt="$ X^T $" src="form_329.png"/> denote the 
transpose of <img class="formulaInl" alt="$ X $" src="form_2.png"/>,</li>
+<li><img class="formulaInl" alt="$ X^+ $" src="form_330.png"/> denote the 
pseudo-inverse of <img class="formulaInl" alt="$ X $" src="form_2.png"/>.</li>
 </ul>
-<p>Maximizing the likelihood is equivalent to maximizing the log-likelihood 
<img class="formulaInl" alt="$ \sum_{i=1}^n \log f(y_i \mid \boldsymbol x_i) $" 
src="form_330.png"/>, which simplifies to minimizing the <b>residual sum of 
squares</b> <img class="formulaInl" alt="$ RSS $" src="form_331.png"/> (also 
called sum of squared residuals or sum of squared errors of prediction), </p><p 
class="formulaDsp">
-<img class="formulaDsp" alt="\[ RSS = \sum_{i=1}^n ( y_i - \boldsymbol c^T 
\boldsymbol x_i )^2 = (\boldsymbol y - X \boldsymbol c)^T (\boldsymbol y - X 
\boldsymbol c) \,. \]" src="form_332.png"/>
+<p>Maximizing the likelihood is equivalent to maximizing the log-likelihood 
<img class="formulaInl" alt="$ \sum_{i=1}^n \log f(y_i \mid \boldsymbol x_i) $" 
src="form_331.png"/>, which simplifies to minimizing the <b>residual sum of 
squares</b> <img class="formulaInl" alt="$ RSS $" src="form_332.png"/> (also 
called sum of squared residuals or sum of squared errors of prediction), </p><p 
class="formulaDsp">
+<img class="formulaDsp" alt="\[ RSS = \sum_{i=1}^n ( y_i - \boldsymbol c^T 
\boldsymbol x_i )^2 = (\boldsymbol y - X \boldsymbol c)^T (\boldsymbol y - X 
\boldsymbol c) \,. \]" src="form_333.png"/>
 </p>
-<p> The first-order conditions yield that the <img class="formulaInl" alt="$ 
RSS $" src="form_331.png"/> is minimized at </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ \boldsymbol c = (X^T X)^+ X^T \boldsymbol y 
\,. \]" src="form_333.png"/>
+<p> The first-order conditions yield that the <img class="formulaInl" alt="$ 
RSS $" src="form_332.png"/> is minimized at </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ \boldsymbol c = (X^T X)^+ X^T \boldsymbol y 
\,. \]" src="form_334.png"/>
 </p>
-<p>Computing the <b>total sum of squares</b> <img class="formulaInl" alt="$ 
TSS $" src="form_334.png"/>, the <b>explained sum of squares</b> <img 
class="formulaInl" alt="$ ESS $" src="form_335.png"/> (also called the 
regression sum of squares), and the <b>coefficient of determination</b> <img 
class="formulaInl" alt="$ R^2 $" src="form_336.png"/> is done according to the 
following formulas: </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\begin{align*} ESS &amp; = \boldsymbol y^T X 
\boldsymbol c - \frac{ \| y \|_1^2 }{n} \\ TSS &amp; = \sum_{i=1}^n y_i^2 - 
\frac{ \| y \|_1^2 }{n} \\ R^2 &amp; = \frac{ESS}{TSS} \end{align*}" 
src="form_337.png"/>
+<p>Computing the <b>total sum of squares</b> <img class="formulaInl" alt="$ 
TSS $" src="form_335.png"/>, the <b>explained sum of squares</b> <img 
class="formulaInl" alt="$ ESS $" src="form_336.png"/> (also called the 
regression sum of squares), and the <b>coefficient of determination</b> <img 
class="formulaInl" alt="$ R^2 $" src="form_337.png"/> is done according to the 
following formulas: </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\begin{align*} ESS &amp; = \boldsymbol y^T X 
\boldsymbol c - \frac{ \| y \|_1^2 }{n} \\ TSS &amp; = \sum_{i=1}^n y_i^2 - 
\frac{ \| y \|_1^2 }{n} \\ R^2 &amp; = \frac{ESS}{TSS} \end{align*}" 
src="form_338.png"/>
 </p>
-<p> Note: The last equality follows from the definition <img 
class="formulaInl" alt="$ R^2 = 1 - \frac{RSS}{TSS} $" src="form_338.png"/> and 
the fact that for linear regression <img class="formulaInl" alt="$ TSS = RSS + 
ESS $" src="form_339.png"/>. A proof of the latter can be found, e.g., at: <a 
href="http://en.wikipedia.org/wiki/Sum_of_squares";>http://en.wikipedia.org/wiki/Sum_of_squares</a></p>
-<p>We estimate the variance <img class="formulaInl" alt="$ Var[Y - \boldsymbol 
c^T \boldsymbol x \mid \boldsymbol x] $" src="form_340.png"/> as </p><p 
class="formulaDsp">
-<img class="formulaDsp" alt="\[ \sigma^2 = \frac{RSS}{n - k} \]" 
src="form_341.png"/>
+<p> Note: The last equality follows from the definition <img 
class="formulaInl" alt="$ R^2 = 1 - \frac{RSS}{TSS} $" src="form_339.png"/> and 
the fact that for linear regression <img class="formulaInl" alt="$ TSS = RSS + 
ESS $" src="form_340.png"/>. A proof of the latter can be found, e.g., at: <a 
href="http://en.wikipedia.org/wiki/Sum_of_squares";>http://en.wikipedia.org/wiki/Sum_of_squares</a></p>
+<p>We estimate the variance <img class="formulaInl" alt="$ Var[Y - \boldsymbol 
c^T \boldsymbol x \mid \boldsymbol x] $" src="form_341.png"/> as </p><p 
class="formulaDsp">
+<img class="formulaDsp" alt="\[ \sigma^2 = \frac{RSS}{n - k} \]" 
src="form_342.png"/>
 </p>
 <p> and compute the t-statistic for coefficient <img class="formulaInl" alt="$ 
i $" src="form_32.png"/> as </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ t_i = \frac{c_i}{\sqrt{\sigma^2 \cdot \left( 
(X^T X)^{-1} \right)_{ii} }} \,. \]" src="form_342.png"/>
+<img class="formulaDsp" alt="\[ t_i = \frac{c_i}{\sqrt{\sigma^2 \cdot \left( 
(X^T X)^{-1} \right)_{ii} }} \,. \]" src="form_343.png"/>
 </p>
-<p>The <img class="formulaInl" alt="$ p $" src="form_110.png"/>-value for 
coefficient <img class="formulaInl" alt="$ i $" src="form_32.png"/> gives the 
probability of seeing a value at least as extreme as the one observed, provided 
that the null hypothesis ( <img class="formulaInl" alt="$ c_i = 0 $" 
src="form_111.png"/>) is true. Letting <img class="formulaInl" alt="$ F_\nu $" 
src="form_343.png"/> denote the cumulative density function of student-t with 
<img class="formulaInl" alt="$ \nu $" src="form_273.png"/> degrees of freedom, 
the <img class="formulaInl" alt="$ p $" src="form_110.png"/>-value for 
coefficient <img class="formulaInl" alt="$ i $" src="form_32.png"/> is 
therefore </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ p_i = \Pr(|T| \geq |t_i|) = 2 \cdot (1 - F_{n 
- k}( |t_i| )) \]" src="form_344.png"/>
+<p>The <img class="formulaInl" alt="$ p $" src="form_110.png"/>-value for 
coefficient <img class="formulaInl" alt="$ i $" src="form_32.png"/> gives the 
probability of seeing a value at least as extreme as the one observed, provided 
that the null hypothesis ( <img class="formulaInl" alt="$ c_i = 0 $" 
src="form_111.png"/>) is true. Letting <img class="formulaInl" alt="$ F_\nu $" 
src="form_344.png"/> denote the cumulative density function of student-t with 
<img class="formulaInl" alt="$ \nu $" src="form_274.png"/> degrees of freedom, 
the <img class="formulaInl" alt="$ p $" src="form_110.png"/>-value for 
coefficient <img class="formulaInl" alt="$ i $" src="form_32.png"/> is 
therefore </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ p_i = \Pr(|T| \geq |t_i|) = 2 \cdot (1 - F_{n 
- k}( |t_i| )) \]" src="form_345.png"/>
 </p>
-<p> where <img class="formulaInl" alt="$ T $" src="form_303.png"/> is a 
student-t distributed random variable with mean 0.</p>
-<p>The condition number [2] <img class="formulaInl" alt="$ \kappa(X) = 
\|X\|_2\cdot\|X^{-1}\|_2$" src="form_345.png"/> is computed as the product of 
two spectral norms [3]. The spectral norm of a matrix <img class="formulaInl" 
alt="$X$" src="form_346.png"/> is the largest singular value of <img 
class="formulaInl" alt="$X$" src="form_346.png"/> i.e. the square root of the 
largest eigenvalue of the positive-semidefinite matrix <img class="formulaInl" 
alt="$X^{*}X$" src="form_324.png"/>:</p>
+<p> where <img class="formulaInl" alt="$ T $" src="form_304.png"/> is a 
student-t distributed random variable with mean 0.</p>
+<p>The condition number [2] <img class="formulaInl" alt="$ \kappa(X) = 
\|X\|_2\cdot\|X^{-1}\|_2$" src="form_346.png"/> is computed as the product of 
two spectral norms [3]. The spectral norm of a matrix <img class="formulaInl" 
alt="$X$" src="form_347.png"/> is the largest singular value of <img 
class="formulaInl" alt="$X$" src="form_347.png"/> i.e. the square root of the 
largest eigenvalue of the positive-semidefinite matrix <img class="formulaInl" 
alt="$X^{*}X$" src="form_325.png"/>:</p>
 <p class="formulaDsp">
-<img class="formulaDsp" alt="\[ \|X\|_2 = 
\sqrt{\lambda_{\max}\left(X^{*}X\right)}\ , \]" src="form_347.png"/>
+<img class="formulaDsp" alt="\[ \|X\|_2 = 
\sqrt{\lambda_{\max}\left(X^{*}X\right)}\ , \]" src="form_348.png"/>
 </p>
-<p> where <img class="formulaInl" alt="$X^{*}$" src="form_348.png"/> is the 
conjugate transpose of <img class="formulaInl" alt="$X$" src="form_346.png"/>. 
The condition number of a linear regression problem is a worst-case measure of 
how sensitive the result is to small perturbations of the input. A large 
condition number (say, more than 1000) indicates the presence of significant 
multicollinearity.</p>
+<p> where <img class="formulaInl" alt="$X^{*}$" src="form_349.png"/> is the 
conjugate transpose of <img class="formulaInl" alt="$X$" src="form_347.png"/>. 
The condition number of a linear regression problem is a worst-case measure of 
how sensitive the result is to small perturbations of the input. A large 
condition number (say, more than 1000) indicates the presence of significant 
multicollinearity.</p>
 <p><a class="anchor" id="literature"></a></p><dl class="section 
user"><dt>Literature</dt><dd></dd></dl>
 <p>[1] Cosma Shalizi: Statistics 36-350: Data Mining, Lecture Notes, 21 
October 2009, <a 
href="http://www.stat.cmu.edu/~cshalizi/350/lectures/17/lecture-17.pdf";>http://www.stat.cmu.edu/~cshalizi/350/lectures/17/lecture-17.pdf</a></p>
 <p>[2] Wikipedia: Condition Number, <a 
href="http://en.wikipedia.org/wiki/Condition_number";>http://en.wikipedia.org/wiki/Condition_number</a>.</p>
@@ -373,7 +373,7 @@ FROM houses, houses_linregr m;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>


http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__lmf.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__lmf.html b/docs/latest/group__grp__lmf.html
index b21f530..0a3f03f 100644
--- a/docs/latest/group__grp__lmf.html
+++ b/docs/latest/group__grp__lmf.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -263,7 +263,7 @@ WHERE id = 1;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__logreg.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__logreg.html 
b/docs/latest/group__grp__logreg.html
index 27b27cb..fe349dc 100644
--- a/docs/latest/group__grp__logreg.html
+++ b/docs/latest/group__grp__logreg.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -173,7 +173,7 @@ logregr_train( source_table,
 <p class="endtd"></p>
 </td></tr>
 <tr>
-<th>condition_no </th><td><p class="starttd">FLOAT8[]. The condition number of 
the <img class="formulaInl" alt="$X^{*}X$" src="form_324.png"/> matrix. A high 
condition number is usually an indication that there may be some numeric 
instability in the result yielding a less reliable model. A high condition 
number often results when there is a significant amount of colinearity in the 
underlying design matrix, in which case other regression techniques may be more 
appropriate. </p>
+<th>condition_no </th><td><p class="starttd">FLOAT8[]. The condition number of 
the <img class="formulaInl" alt="$X^{*}X$" src="form_325.png"/> matrix. A high 
condition number is usually an indication that there may be some numeric 
instability in the result yielding a less reliable model. A high condition 
number often results when there is a significant amount of colinearity in the 
underlying design matrix, in which case other regression techniques may be more 
appropriate. </p>
 <p class="endtd"></p>
 </td></tr>
 <tr>
@@ -312,7 +312,7 @@ SELECT madlib.logregr_train( 'patients',
                              20,
                              'irls'
                            );
-</pre></li>
+</pre> (Note that in this example we are dynamically creating the array of 
independent variables from column names. If you have large numbers of 
independent variables beyond the PostgreSQL limit of maximum columns per table, 
you would pre-build the arrays and store them in a single column.)</li>
 <li>View the regression results. <pre class="example">
 -- Set extended display on for easier reading of output
 \x on
@@ -356,19 +356,19 @@ ORDER BY p.id;
 </dd></dl>
 <p><a class="anchor" id="notes"></a></p><dl class="section 
user"><dt>Notes</dt><dd>All table names can be optionally schema qualified 
(current_schemas() would be searched if a schema name is not provided) and all 
table and column names should follow case-sensitivity and quoting rules per the 
database. (For instance, 'mytable' and 'MyTable' both resolve to the same 
entity, i.e. 'mytable'. If mixed-case or multi-byte characters are desired for 
entity names then the string should be double-quoted; in this case the input 
would be '"MyTable"').</dd></dl>
 <p><a class="anchor" id="background"></a></p><dl class="section 
user"><dt>Technical Background</dt><dd></dd></dl>
-<p>(Binomial) logistic regression refers to a stochastic model in which the 
conditional mean of the dependent dichotomous variable (usually denoted <img 
class="formulaInl" alt="$ Y \in \{ 0,1 \} $" src="form_353.png"/>) is the 
logistic function of an affine function of the vector of independent variables 
(usually denoted <img class="formulaInl" alt="$ \boldsymbol x $" 
src="form_58.png"/>). That is, </p><p class="formulaDsp">
+<p>(Binomial) logistic regression refers to a stochastic model in which the 
conditional mean of the dependent dichotomous variable (usually denoted <img 
class="formulaInl" alt="$ Y \in \{ 0,1 \} $" src="form_354.png"/>) is the 
logistic function of an affine function of the vector of independent variables 
(usually denoted <img class="formulaInl" alt="$ \boldsymbol x $" 
src="form_58.png"/>). That is, </p><p class="formulaDsp">
 <img class="formulaDsp" alt="\[ E[Y \mid \boldsymbol x] = \sigma(\boldsymbol 
c^T \boldsymbol x) \]" src="form_94.png"/>
 </p>
 <p> for some unknown vector of coefficients <img class="formulaInl" alt="$ 
\boldsymbol c $" src="form_78.png"/> and where <img class="formulaInl" alt="$ 
\sigma(x) = \frac{1}{1 + \exp(-x)} $" src="form_95.png"/> is the logistic 
function. Logistic regression finds the vector of coefficients <img 
class="formulaInl" alt="$ \boldsymbol c $" src="form_78.png"/> that maximizes 
the likelihood of the observations.</p>
 <p>Let</p><ul>
-<li><img class="formulaInl" alt="$ \boldsymbol y \in \{ 0,1 \}^n $" 
src="form_354.png"/> denote the vector of observed dependent variables, with 
<img class="formulaInl" alt="$ n $" src="form_10.png"/> rows, containing the 
observed values of the dependent variable,</li>
+<li><img class="formulaInl" alt="$ \boldsymbol y \in \{ 0,1 \}^n $" 
src="form_355.png"/> denote the vector of observed dependent variables, with 
<img class="formulaInl" alt="$ n $" src="form_10.png"/> rows, containing the 
observed values of the dependent variable,</li>
 <li><img class="formulaInl" alt="$ X \in \mathbf R^{n \times k} $" 
src="form_98.png"/> denote the design matrix with <img class="formulaInl" 
alt="$ k $" src="form_97.png"/> columns and <img class="formulaInl" alt="$ n $" 
src="form_10.png"/> rows, containing all observed vectors of independent 
variables <img class="formulaInl" alt="$ \boldsymbol x_i $" src="form_99.png"/> 
as rows.</li>
 </ul>
 <p>By definition, </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ P[Y = y_i | \boldsymbol x_i] = \sigma((-1)^{(1 
- y_i)} \cdot \boldsymbol c^T \boldsymbol x_i) \,. \]" src="form_355.png"/>
+<img class="formulaDsp" alt="\[ P[Y = y_i | \boldsymbol x_i] = \sigma((-1)^{(1 
- y_i)} \cdot \boldsymbol c^T \boldsymbol x_i) \,. \]" src="form_356.png"/>
 </p>
 <p> Maximizing the likelihood <img class="formulaInl" alt="$ \prod_{i=1}^n 
\Pr(Y = y_i \mid \boldsymbol x_i) $" src="form_101.png"/> is equivalent to 
maximizing the log-likelihood <img class="formulaInl" alt="$ \sum_{i=1}^n \log 
\Pr(Y = y_i \mid \boldsymbol x_i) $" src="form_102.png"/>, which simplifies to 
</p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ l(\boldsymbol c) = -\sum_{i=1}^n \log(1 + 
\exp((-1)^{(1 - y_i)} \cdot \boldsymbol c^T \boldsymbol x_i)) \,. \]" 
src="form_356.png"/>
+<img class="formulaDsp" alt="\[ l(\boldsymbol c) = -\sum_{i=1}^n \log(1 + 
\exp((-1)^{(1 - y_i)} \cdot \boldsymbol c^T \boldsymbol x_i)) \,. \]" 
src="form_357.png"/>
 </p>
 <p> The Hessian of this objective is <img class="formulaInl" alt="$ H = -X^T A 
X $" src="form_104.png"/> where <img class="formulaInl" alt="$ A = 
\text{diag}(a_1, \dots, a_n) $" src="form_105.png"/> is the diagonal matrix 
with <img class="formulaInl" alt="$ a_i = \sigma(\boldsymbol c^T \boldsymbol x) 
\cdot \sigma(-\boldsymbol c^T \boldsymbol x) \,. $" src="form_106.png"/> Since 
<img class="formulaInl" alt="$ H $" src="form_107.png"/> is non-positive 
definite, <img class="formulaInl" alt="$ l(\boldsymbol c) $" 
src="form_79.png"/> is convex. There are many techniques for solving convex 
optimization problems. Currently, logistic regression in MADlib can use one of 
three algorithms:</p><ul>
 <li>Iteratively Reweighted Least Squares</li>
@@ -410,7 +410,7 @@ ORDER BY p.id;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__marginal.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__marginal.html 
b/docs/latest/group__grp__marginal.html
index d43a334..26aae59 100644
--- a/docs/latest/group__grp__marginal.html
+++ b/docs/latest/group__grp__marginal.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -123,7 +123,7 @@ 
$(document).ready(function(){initNavTree('group__grp__marginal.html','');});
 <li>
 <a href="#related">Related Topics</a> </li>
 </ul>
-</div><p>A marginal effect (ME) or partial effect measures the effect on the 
conditional mean of <img class="formulaInl" alt="$ y $" src="form_323.png"/> 
for a change in one of the regressors, say <img class="formulaInl" alt="$X_k$" 
src="form_366.png"/>. In the linear regression model, the ME equals the 
relevant slope coefficient, greatly simplifying analysis. For nonlinear models, 
specialized algorithms are required for calculating ME. The marginal effect 
computed is the average of the marginal effect at every data point present in 
the source table.</p>
+</div><p>A marginal effect (ME) or partial effect measures the effect on the 
conditional mean of <img class="formulaInl" alt="$ y $" src="form_324.png"/> 
for a change in one of the regressors, say <img class="formulaInl" alt="$X_k$" 
src="form_367.png"/>. In the linear regression model, the ME equals the 
relevant slope coefficient, greatly simplifying analysis. For nonlinear models, 
specialized algorithms are required for calculating ME. The marginal effect 
computed is the average of the marginal effect at every data point present in 
the source table.</p>
 <p>MADlib provides marginal effects regression functions for linear, logistic 
and multinomial logistic regressions.</p>
 <dl class="section warning"><dt>Warning</dt><dd>The <a class="el" 
href="marginal_8sql__in.html#a9517d679ee4209126895445cbed51fe3">margins_logregr()</a>
 and <a class="el" 
href="marginal_8sql__in.html#ae39ad0e1beca060fd153dba35901a4e7">margins_mlogregr()</a>
 functions have been deprecated in favor of the <a class="el" 
href="marginal_8sql__in.html#a36fcae5245ca31517723fce38b183c90" title="Marginal 
effects with default variable_names. ">margins()</a> function.</dd></dl>
 <p><a class="anchor" id="margins"></a></p><dl class="section 
user"><dt>Marginal Effects with Interaction Terms</dt><dd><pre class="syntax">
@@ -398,16 +398,16 @@ p_values     | 
{0.00729989838349161,0.181668346802398,8.89828265128986e-17}
 </ol>
 <p><a class="anchor" id="notes"></a> </p><dl class="section 
note"><dt>Note</dt><dd>The <em>marginal_vars</em> argument is a list with the 
names matching those in 'x_design'. If no 'x_design' is present (i.e. no 
interaction and no indicator variables), then <em>marginal_vars</em> must be 
the indices (base 1) of variables in 'independent_varname'. Use <em>NULL</em> 
to use all independent variables. It is important to note that the 
<em>independent_varname</em> array in the underlying regression is assumed to 
start with a lower bound index of 1. Arrays that don't follow this would result 
in an incorrect solution.</dd></dl>
 <p><a class="anchor" id="background"></a></p><dl class="section 
user"><dt>Technical Background</dt><dd></dd></dl>
-<p>The standard approach to modeling dichotomous/binary variables (so <img 
class="formulaInl" alt="$y \in \{0, 1\} $" src="form_367.png"/>) is to estimate 
a generalized linear model under the assumption that <img class="formulaInl" 
alt="$ y $" src="form_323.png"/> follows some form of Bernoulli distribution. 
Thus the expected value of <img class="formulaInl" alt="$ y $" 
src="form_323.png"/> becomes, </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ y = G(X' \beta), \]" src="form_368.png"/>
+<p>The standard approach to modeling dichotomous/binary variables (so <img 
class="formulaInl" alt="$y \in \{0, 1\} $" src="form_368.png"/>) is to estimate 
a generalized linear model under the assumption that <img class="formulaInl" 
alt="$ y $" src="form_324.png"/> follows some form of Bernoulli distribution. 
Thus the expected value of <img class="formulaInl" alt="$ y $" 
src="form_324.png"/> becomes, </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ y = G(X' \beta), \]" src="form_369.png"/>
 </p>
-<p>where G is the specified binomial distribution. For logistic regression, 
the function <img class="formulaInl" alt="$ G $" src="form_369.png"/> 
represents the inverse logit function.</p>
+<p>where G is the specified binomial distribution. For logistic regression, 
the function <img class="formulaInl" alt="$ G $" src="form_370.png"/> 
represents the inverse logit function.</p>
 <p>In logistic regression: </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ P = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + 
\dots \beta_j x_j)}} = \frac{1}{1 + e^{-z}} \implies \frac{\partial P}{\partial 
X_k} = \beta_k \cdot \frac{1}{1 + e^{-z}} \cdot \frac{e^{-z}}{1 + e^{-z}} \\ = 
\beta_k \cdot P \cdot (1-P) \]" src="form_370.png"/>
+<img class="formulaDsp" alt="\[ P = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + 
\dots \beta_j x_j)}} = \frac{1}{1 + e^{-z}} \implies \frac{\partial P}{\partial 
X_k} = \beta_k \cdot \frac{1}{1 + e^{-z}} \cdot \frac{e^{-z}}{1 + e^{-z}} \\ = 
\beta_k \cdot P \cdot (1-P) \]" src="form_371.png"/>
 </p>
 <p>There are several methods for calculating the marginal effects for 
dichotomous dependent variables. This package uses the average of the marginal 
effects at every sample observation.</p>
 <p>This is calculated as follows: </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ \frac{\partial y}{\partial x_k} = \beta_k 
\frac{\sum_{i=1}^n P(y_i = 1)(1-P(y_i = 1))}{n}, \\ \text{where}, P(y_i=1) = 
g(X^{(i)}\beta) \]" src="form_371.png"/>
+<img class="formulaDsp" alt="\[ \frac{\partial y}{\partial x_k} = \beta_k 
\frac{\sum_{i=1}^n P(y_i = 1)(1-P(y_i = 1))}{n}, \\ \text{where}, P(y_i=1) = 
g(X^{(i)}\beta) \]" src="form_372.png"/>
 </p>
 <p>We use the delta method for calculating standard errors on the marginal 
effects.</p>
 <p><a class="anchor" id="literature"></a></p><dl class="section 
user"><dt>Literature</dt><dd></dd></dl>
@@ -419,7 +419,7 @@ p_values     | 
{0.00729989838349161,0.181668346802398,8.89828265128986e-17}
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__matrix.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__matrix.html 
b/docs/latest/group__grp__matrix.html
index d2814d7..fe4148c 100644
--- a/docs/latest/group__grp__matrix.html
+++ b/docs/latest/group__grp__matrix.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -287,16 +287,16 @@ 
$(document).ready(function(){initNavTree('group__grp__matrix.html','');});
 <dt>matrix_out </dt>
 <dd><p class="startdd">TEXT. Name of the table to store the result matrix.</p>
 <p>For Cholesky, QR and LU decompositions, a prefix 
(<em>matrix_out_prefix</em>) is used as a basis to build the names of the 
various output tables.</p>
-<p>For Cholesky decomposition ( <img class="formulaInl" alt="$ PA = LDL* $" 
src="form_545.png"/>), the following suffixes are added to 
<em>matrix_out_prefix</em>:</p><ul>
+<p>For Cholesky decomposition ( <img class="formulaInl" alt="$ PA = LDL* $" 
src="form_189.png"/>), the following suffixes are added to 
<em>matrix_out_prefix</em>:</p><ul>
 <li><em>_p</em> for row permutation matrix P</li>
 <li><em>_l</em> for lower triangular factor L</li>
 <li><em>_d</em> for diagonal matrix D</li>
 </ul>
-<p>For QR decomposition ( <img class="formulaInl" alt="$ A = QR $" 
src="form_189.png"/>) the following suffixes are added to 
<em>matrix_out_prefix</em>:</p><ul>
+<p>For QR decomposition ( <img class="formulaInl" alt="$ A = QR $" 
src="form_190.png"/>) the following suffixes are added to 
<em>matrix_out_prefix</em>:</p><ul>
 <li><em>_q</em> for orthogonal matrix Q</li>
 <li><em>_r</em> for upper triangular factor R</li>
 </ul>
-<p>For LU decomposition with full pivoting ( <img class="formulaInl" alt="$ 
PAQ = LU $" src="form_190.png"/>), the following suffixes are added to 
<em>matrix_out_prefix</em>:</p><ul>
+<p>For LU decomposition with full pivoting ( <img class="formulaInl" alt="$ 
PAQ = LU $" src="form_191.png"/>), the following suffixes are added to 
<em>matrix_out_prefix</em>:</p><ul>
 <li><em>_p</em> for row permutation matrix P</li>
 <li><em>_q</em> for column permutation matrix Q</li>
 <li><em>_l</em> for lower triangular factor L</li>
@@ -873,7 +873,7 @@ SELECT madlib.matrix_norm('"mat_A_sparse"', 'row="rowNum", 
col=col_num, val=entr
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__matrix__factorization.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__matrix__factorization.html 
b/docs/latest/group__grp__matrix__factorization.html
index a838c4d..2759dd8 100644
--- a/docs/latest/group__grp__matrix__factorization.html
+++ b/docs/latest/group__grp__matrix__factorization.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -127,7 +127,7 @@ Modules</h2></td></tr>
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__mdl.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__mdl.html b/docs/latest/group__grp__mdl.html
index d52ca5d..f9f6d31 100644
--- a/docs/latest/group__grp__mdl.html
+++ b/docs/latest/group__grp__mdl.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -112,11 +112,15 @@ 
$(document).ready(function(){initNavTree('group__grp__mdl.html','');});
 </div><!--header-->
 <div class="contents">
 <a name="details" id="details"></a><h2 class="groupheader">Detailed 
Description</h2>
-<p>Contains the cross-validation module, a collection of routines useful for 
<a 
href="http://en.wikipedia.org/wiki/Cross-validation_(statistics)">Cross-validation</a>.
 </p>
+<p>Contains functions for evaluating accuracy and validation of predictive 
methods. </p>
 <table class="memberdecls">
 <tr class="heading"><td colspan="2"><h2 class="groupheader"><a 
name="groups"></a>
 Modules</h2></td></tr>
 <tr class="memitem:group__grp__validation"><td class="memItemLeft" 
align="right" valign="top">&#160;</td><td class="memItemRight" 
valign="bottom"><a class="el" href="group__grp__validation.html">Cross 
Validation</a></td></tr>
+<tr class="memdesc:group__grp__validation"><td 
class="mdescLeft">&#160;</td><td class="mdescRight">Estimates the fit of a 
predictive model given a data set and specifications for the training, 
prediction, and error estimation functions. <br /></td></tr>
+<tr class="separator:"><td class="memSeparator" colspan="2">&#160;</td></tr>
+<tr class="memitem:group__grp__pred"><td class="memItemLeft" align="right" 
valign="top">&#160;</td><td class="memItemRight" valign="bottom"><a class="el" 
href="group__grp__pred.html">Prediction Metrics</a></td></tr>
+<tr class="memdesc:group__grp__pred"><td class="mdescLeft">&#160;</td><td 
class="mdescRight">Provides various prediction accuracy metrics. <br 
/></td></tr>
 <tr class="separator:"><td class="memSeparator" colspan="2">&#160;</td></tr>
 </table>
 </div><!-- contents -->
@@ -124,7 +128,7 @@ Modules</h2></td></tr>
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__mdl.js
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__mdl.js b/docs/latest/group__grp__mdl.js
index 0dafbec..f2e3c69 100644
--- a/docs/latest/group__grp__mdl.js
+++ b/docs/latest/group__grp__mdl.js
@@ -1,4 +1,5 @@
 var group__grp__mdl =
 [
-    [ "Cross Validation", "group__grp__validation.html", null ]
+    [ "Cross Validation", "group__grp__validation.html", null ],
+    [ "Prediction Metrics", "group__grp__pred.html", null ]
 ];
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__mfvsketch.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__mfvsketch.html 
b/docs/latest/group__grp__mfvsketch.html
index 2c9a5e6..e26f58a 100644
--- a/docs/latest/group__grp__mfvsketch.html
+++ b/docs/latest/group__grp__mfvsketch.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -162,7 +162,7 @@ FROM data;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:11 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__mlogreg.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__mlogreg.html 
b/docs/latest/group__grp__mlogreg.html
index 5497bf9..560c951 100644
--- a/docs/latest/group__grp__mlogreg.html
+++ b/docs/latest/group__grp__mlogreg.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -402,7 +402,7 @@ coef                     | 
{{1.45474045211601,0.0849956182104023,-0.017238349960
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:11 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__multinom.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__multinom.html 
b/docs/latest/group__grp__multinom.html
index 47d2591..76ef400 100644
--- a/docs/latest/group__grp__multinom.html
+++ b/docs/latest/group__grp__multinom.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -473,7 +473,7 @@ SELECT * FROM test3_prd_prob;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__ordinal.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__ordinal.html 
b/docs/latest/group__grp__ordinal.html
index cb170d9..786e1e3 100644
--- a/docs/latest/group__grp__ordinal.html
+++ b/docs/latest/group__grp__ordinal.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -456,7 +456,7 @@ SELECT * FROM test3_prd_prob;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__path.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__path.html 
b/docs/latest/group__grp__path.html
index bb4227f..78e9f56 100644
--- a/docs/latest/group__grp__path.html
+++ b/docs/latest/group__grp__path.html
@@ -6,7 +6,7 @@
 <meta http-equiv="X-UA-Compatible" content="IE=9"/>
 <meta name="generator" content="Doxygen 1.8.10"/>
 <meta name="keywords" content="madlib,postgres,greenplum,machine learning,data 
mining,deep learning,ensemble methods,data science,market basket 
analysis,affinity analysis,pca,lda,regression,elastic net,huber 
white,proportional hazards,k-means,latent dirichlet allocation,bayes,support 
vector machines,svm"/>
-<title>MADlib: Path Functions</title>
+<title>MADlib: Path</title>
 <link href="tabs.css" rel="stylesheet" type="text/css"/>
 <script type="text/javascript" src="jquery.js"></script>
 <script type="text/javascript" src="dynsections.js"></script>
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -106,7 +106,7 @@ 
$(document).ready(function(){initNavTree('group__grp__path.html','');});
 
 <div class="header">
   <div class="headertitle">
-<div class="title">Path Functions<div class="ingroups"><a class="el" 
href="group__grp__utility__functions.html">Utility Functions</a></div></div>  
</div>
+<div class="title">Path<div class="ingroups"><a class="el" 
href="group__grp__utility__functions.html">Utility Functions</a></div></div>  
</div>
 </div><!--header-->
 <div class="contents">
 <div class="toc"><b>Contents</b> </p><ul>
@@ -140,7 +140,8 @@ path(
     symbol,
     pattern,
     aggregate_func,
-    persist_rows
+    persist_rows,
+    overlapping_patterns
 )
 </pre></dd></dl>
 <p><b>Arguments</b> </p><dl class="arglist">
@@ -153,7 +154,7 @@ path(
 <p class="enddd"></p>
 </dd>
 <dt>partition_expr </dt>
-<dd><p class="startdd">VARCHAR. The 'partition_expr' can be a single column or 
a list of comma-separated columns/expressions to divide all rows into groups, 
or partitions. Matching is applied across the rows that fall into t he same 
partition. This can be NULL or '' to indicate the matching is to be applied to 
the whole table.</p>
+<dd><p class="startdd">VARCHAR. The 'partition_expr' can be a single column or 
a list of comma-separated columns/expressions to divide all rows into groups, 
or partitions. Matching is applied across the rows that fall into the same 
partition. This can be NULL or '' to indicate the matching is to be applied to 
the whole table.</p>
 <p class="enddd"></p>
 </dd>
 <dt>order_expr </dt>
@@ -191,13 +192,17 @@ Parentheses () can be used to group items into a single 
logical item. </li>
 </ul>
 <p class="enddd"></p>
 </dd>
-<dt>aggregate_func </dt>
-<dd><p class="startdd">VARCHAR. A comma-separated list of aggregates to be 
applied to the pattern matches [3]. Please note that window functions cannot 
currently be used in the parameter 'aggregate_func'. If you want to use a 
window function [4], output the pattern matches and write a SQL query with a 
window function over the output tuples (see 'persist_rows' parameter below).</p>
+<dt>aggregate_func (optional) </dt>
+<dd><p class="startdd">VARCHAR, default NULL. A comma-separated list of 
aggregates to be applied to the pattern matches [3]. Please note that window 
functions cannot currently be used in the parameter 'aggregate_func'. If you 
want to use a window function [4], output the pattern matches and write a SQL 
query with a window function over the output tuples (see 'persist_rows' 
parameter below).</p>
 <p>If you just want to output the pattern matched rows and not compute any 
aggregates, you can put NULL or '' in the 'aggregate_func' parameter. </p>
 <p class="enddd"></p>
 </dd>
-<dt>persist_rows </dt>
-<dd><p class="startdd">BOOLEAN. If TRUE the matched rows are persisted in a 
separate output table. This table is named as &lt;output_table&gt;_tuples (the 
string "_tuples" is added as suffix to the value of <em>output_table</em>). </p>
+<dt>persist_rows (optional) </dt>
+<dd><p class="startdd">BOOLEAN, default FALSE. If TRUE the matched rows are 
persisted in a separate output table. This table is named as 
&lt;output_table&gt;_tuples (the string "_tuples" is added as suffix to the 
value of <em>output_table</em>). </p>
+<p class="enddd"></p>
+</dd>
+<dt>overlapping_patterns (optional) </dt>
+<dd><p class="startdd">BOOLEAN, default FALSE. If TRUE find every occurrence 
of the pattern in the partition, regardless of whether it might have been part 
of a previously found match. </p>
 <p class="enddd"></p>
 </dd>
 </dl>
@@ -205,7 +210,7 @@ Parentheses () can be used to group items into a single 
logical item. </li>
 <p>The data set describes shopper behavior on a notional web site that sells 
beer and wine. A beacon fires an event to a log file when the shopper visits 
different pages on the site: landing page, beer selection page, wine selection 
page, and checkout. Other pages on the site like help pages show up in the logs 
as well. Letâs assume that the log has been sessionized.</p>
 <p>Create the date table:</p>
 <pre class="example">
-DROP TABLE IF EXISTS eventlog, path_output, path_output_tuples;
+DROP TABLE IF EXISTS eventlog;
 CREATE TABLE eventlog (event_timestamp TIMESTAMP,
             user_id INT,
             session_id INT,
@@ -248,7 +253,8 @@ INSERT INTO eventlog VALUES
 ('04/15/2015 02:19:00', 103711, 109, 'WINE', 0);
 </pre><ol type="1">
 <li>Calculate the revenue by checkout: <pre class="example">
- SELECT madlib.path(
+DROP TABLE IF EXISTS path_output, path_output_tuples;
+SELECT madlib.path(
      'eventlog',                -- Name of input table
      'path_output',             -- Table name to store path results
      'session_id',              -- Partition input table by session
@@ -294,6 +300,7 @@ SELECT * FROM path_output_tuples ORDER BY session_id ASC, 
event_timestamp ASC;
 (6 rows)
 </pre> Notice that the 'symbol' and 'match_id' columns are added to the right 
of the matched rows.</li>
 <li>We are interested in sessions with an order placed within 4 pages of 
entering the shopping site via the landing page. We represent this by the 
regular expression: '(land)[^(land)(buy)]{0,2}(buy)'. In other words, visit to 
the landing page followed by from 0 to 2 non-entry, non-sale pages, followed by 
a purchase. The SQL is as follows: <pre class="example">
+DROP TABLE IF EXISTS path_output, path_output_tuples;
 SELECT madlib.path(
      'eventlog',                -- Name of input table
      'path_output',             -- Table name to store path results
@@ -348,6 +355,7 @@ SELECT DATE(event_timestamp), user_id, session_id, revenue,
 (3 rows)
 </pre> Here we are partitioning the window function by day because we want 
daily averages, although our sample data set only has a single day.</li>
 <li>Now we want to do a golden path analysis to find the most successful 
shopper paths through the site. Since our data set is small, we decide this 
means the most frequently viewed page just before a checkout is made: <pre 
class="example">
+DROP TABLE IF EXISTS path_output, path_output_tuples;
 SELECT madlib.path(
      'eventlog',                -- Name of input table
      'path_output',             -- Table name to store path results
@@ -362,25 +370,67 @@ SELECT madlib.path(
      'array_agg(page ORDER BY session_id ASC, event_timestamp ASC) as 
page_path',    -- Build array with shopper paths
      FALSE                       -- Don't persist matches
      );
-</pre></li>
-</ol>
-<p>Now count the common paths and print the most frequent:</p>
-<pre class="example">
+</pre> Now count the common paths and print the most frequent: <pre 
class="example">
 SELECT count(*), page_path from
     (SELECT * FROM path_output) q
 GROUP BY page_path
 ORDER BY count(*) DESC
 LIMIT 10;
-</pre><p>Result: </p><pre class="result">
+</pre> Result: <pre class="result">
  count |    page_path
 -------+-----------------
      5 | {WINE,CHECKOUT}
      1 | {BEER,CHECKOUT}
 (2 rows)
-</pre><p>There are only 2 different paths. The wine page is viewed more 
frequently than the beer page just before checkout.</p>
-<p><a class="anchor" id="note"></a></p><dl class="section 
note"><dt>Note</dt><dd>Please note some current limitations of the path 
algorithm. These limitations will be addressed in subsequent releases.<ul>
+</pre> There are only 2 different paths. The wine page is viewed more 
frequently than the beer page just before checkout.</li>
+<li>To demonstrate the use of 'overlapping_patterns', consider a pattern with 
at least one page followed by and ending with a checkout: <pre class="example">
+DROP TABLE IF EXISTS path_output, path_output_tuples;
+SELECT madlib.path(                                                            
       
+     'eventlog',                    -- Name of the table                       
                    
+     'path_output',                 -- Table name to store the path results    
                     
+     'session_id',                  -- Partition by session                 
+     'event_timestamp ASC',         -- Order partitions in input table by time 
      
+     $$ nobuy:=page&lt;&gt;'CHECKOUT',
+        buy:=page='CHECKOUT'
+     $$,  -- Definition of symbols used in the pattern definition 
+     '(nobuy)+(buy)',         -- At least one page followed by and ending with 
a CHECKOUT.
+     'array_agg(page ORDER BY session_id ASC, event_timestamp ASC) as 
page_path',  
+     FALSE,                        -- Don't persist matches
+     TRUE                          -- Turn on overlapping patterns
+     );
+SELECT * FROM path_output ORDER BY session_id, match_id;
+</pre> Result with overlap turned on: <pre class="result">
+ session_id | match_id |             page_path             
+------------+----------+-----------------------------------
+        100 |        1 | {LANDING,WINE,CHECKOUT}
+        100 |        2 | {WINE,CHECKOUT}
+        102 |        1 | {LANDING,WINE,CHECKOUT}
+        102 |        2 | {WINE,CHECKOUT}
+        102 |        3 | {LANDING,HELP,WINE,CHECKOUT}
+        102 |        4 | {HELP,WINE,CHECKOUT}
+        102 |        5 | {WINE,CHECKOUT}
+        103 |        1 | {LANDING,WINE,HELP,WINE,CHECKOUT}
+        103 |        2 | {WINE,HELP,WINE,CHECKOUT}
+        103 |        3 | {HELP,WINE,CHECKOUT}
+        103 |        4 | {WINE,CHECKOUT}
+        104 |        1 | {BEER,CHECKOUT}
+        108 |        1 | {BEER,WINE,CHECKOUT}
+        108 |        2 | {WINE,CHECKOUT}
+(14 rows)
+</pre> With overlap turned off, the result would be: <pre class="result">
+ session_id | match_id |             page_path             
+------------+----------+-----------------------------------
+        100 |        1 | {LANDING,WINE,CHECKOUT}
+        102 |        1 | {LANDING,WINE,CHECKOUT}
+        102 |        2 | {LANDING,HELP,WINE,CHECKOUT}
+        103 |        1 | {LANDING,WINE,HELP,WINE,CHECKOUT}
+        104 |        1 | {BEER,CHECKOUT}
+        108 |        1 | {BEER,WINE,CHECKOUT}
+(6 rows)
+</pre></li>
+</ol>
+<p><a class="anchor" id="note"></a></p><dl class="section 
note"><dt>Note</dt><dd>Please note some current limitations of the path 
algorithm.<ul>
 <li>Window functions cannot currently be used in the parameter 
'aggregate_func'. Instead, output the pattern matches and write a SQL query 
with a window function over the output tuples.</li>
-<li>Overlapping pattern matches are not supported. That is, a given row can 
only belong to one pattern match (non-overlapping).</li>
 <li>A given row can only match one symbol. If a row matches multiple symbols, 
the symbol that comes <em>first</em> in the symbol definition list will take 
precedence.</li>
 <li>Maximum number of symbols that can be defined is 35.</li>
 <li>The columns 'match_id' and 'symbol' are generated by the path algorithm. 
If coincidently you have columns in your input data named 'match_id' or 
'symbol', the system generated column names will be changed to 
"__madlib_path_match_id__" and "__madlib_path_symbol__"</li>
@@ -415,7 +465,7 @@ LIMIT 10;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:11 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__pca.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__pca.html b/docs/latest/group__grp__pca.html
index 80e7174..c0db3ed 100644
--- a/docs/latest/group__grp__pca.html
+++ b/docs/latest/group__grp__pca.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -128,7 +128,7 @@ Modules</h2></td></tr>
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__pca__project.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__pca__project.html 
b/docs/latest/group__grp__pca__project.html
index 8d2fbf0..7e37313 100644
--- a/docs/latest/group__grp__pca__project.html
+++ b/docs/latest/group__grp__pca__project.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -145,7 +145,7 @@ madlib.pca_sparse_project( source_table,
 </pre></dd></dl>
 <dl class="section user"><dt>Arguments</dt><dd><dl class="arglist">
 <dt>source_table </dt>
-<dd><p class="startdd">TEXT. Source table name. Identical to <a class="el" 
href="pca_8sql__in.html#a31abf88e67a446a4f789764aa2c61e85">pca_train</a>, the 
input data matrix should have <img class="formulaInl" alt="$ N $" 
src="form_218.png"/> rows and <img class="formulaInl" alt="$ M $" 
src="form_174.png"/> columns, where <img class="formulaInl" alt="$ N $" 
src="form_218.png"/> is the number of data points, and <img class="formulaInl" 
alt="$ M $" src="form_174.png"/> is the number of features for each data 
point.</p>
+<dd><p class="startdd">TEXT. Source table name. Identical to <a class="el" 
href="pca_8sql__in.html#a31abf88e67a446a4f789764aa2c61e85">pca_train</a>, the 
input data matrix should have <img class="formulaInl" alt="$ N $" 
src="form_219.png"/> rows and <img class="formulaInl" alt="$ M $" 
src="form_174.png"/> columns, where <img class="formulaInl" alt="$ N $" 
src="form_219.png"/> is the number of data points, and <img class="formulaInl" 
alt="$ M $" src="form_174.png"/> is the number of features for each data 
point.</p>
 <p>The input table for <em> pca_project </em> is expected to be in the one of 
the two standard MADlib dense matrix formats, and the sparse input table for 
<em> pca_sparse_project </em> should be in the standard MADlib sparse matrix 
format. These formats are described in the documentation for <a class="el" 
href="pca_8sql__in.html#a31abf88e67a446a4f789764aa2c61e85">pca_train</a>.</p>
 <p class="enddd"></p>
 </dd>
@@ -260,19 +260,19 @@ SELECT * FROM result_summary_table;
 </ul>
 </dd></dl>
 <p><a class="anchor" id="background"></a></p><dl class="section 
user"><dt>Technical Background</dt><dd></dd></dl>
-<p>Given a table containing some principal components <img class="formulaInl" 
alt="$ \boldsymbol P $" src="form_229.png"/> and some input data <img 
class="formulaInl" alt="$ \boldsymbol X $" src="form_219.png"/>, the 
low-dimensional representation <img class="formulaInl" alt="$ {\boldsymbol X}' 
$" src="form_230.png"/> is computed as </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\begin{align*} {\boldsymbol {\hat{X}}} &amp; = 
{\boldsymbol X} - \vec{e} \hat{x}^T \\ {\boldsymbol X}' &amp; = {\boldsymbol 
{\hat {X}}} {\boldsymbol P}. \end{align*}" src="form_231.png"/>
+<p>Given a table containing some principal components <img class="formulaInl" 
alt="$ \boldsymbol P $" src="form_230.png"/> and some input data <img 
class="formulaInl" alt="$ \boldsymbol X $" src="form_220.png"/>, the 
low-dimensional representation <img class="formulaInl" alt="$ {\boldsymbol X}' 
$" src="form_231.png"/> is computed as </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\begin{align*} {\boldsymbol {\hat{X}}} &amp; = 
{\boldsymbol X} - \vec{e} \hat{x}^T \\ {\boldsymbol X}' &amp; = {\boldsymbol 
{\hat {X}}} {\boldsymbol P}. \end{align*}" src="form_232.png"/>
 </p>
-<p> where <img class="formulaInl" alt="$\hat{x} $" src="form_232.png"/> is the 
column means of <img class="formulaInl" alt="$ \boldsymbol X $" 
src="form_219.png"/> and <img class="formulaInl" alt="$ \vec{e} $" 
src="form_224.png"/> is the vector of all ones. This step is equivalent to 
centering the data around the origin.</p>
-<p>The residual table <img class="formulaInl" alt="$ \boldsymbol R $" 
src="form_233.png"/> is a measure of how well the low-dimensional 
representation approximates the true input data, and is computed as </p><p 
class="formulaDsp">
-<img class="formulaDsp" alt="\[ {\boldsymbol R} = {\boldsymbol {\hat{X}}} - 
{\boldsymbol X}' {\boldsymbol P}^T. \]" src="form_234.png"/>
+<p> where <img class="formulaInl" alt="$\hat{x} $" src="form_233.png"/> is the 
column means of <img class="formulaInl" alt="$ \boldsymbol X $" 
src="form_220.png"/> and <img class="formulaInl" alt="$ \vec{e} $" 
src="form_225.png"/> is the vector of all ones. This step is equivalent to 
centering the data around the origin.</p>
+<p>The residual table <img class="formulaInl" alt="$ \boldsymbol R $" 
src="form_234.png"/> is a measure of how well the low-dimensional 
representation approximates the true input data, and is computed as </p><p 
class="formulaDsp">
+<img class="formulaDsp" alt="\[ {\boldsymbol R} = {\boldsymbol {\hat{X}}} - 
{\boldsymbol X}' {\boldsymbol P}^T. \]" src="form_235.png"/>
 </p>
 <p> A residual matrix with entries mostly close to zero indicates a good 
representation.</p>
-<p>The residual norm <img class="formulaInl" alt="$ r $" src="form_235.png"/> 
is simply </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ r = \|{\boldsymbol R}\|_F \]" 
src="form_236.png"/>
+<p>The residual norm <img class="formulaInl" alt="$ r $" src="form_236.png"/> 
is simply </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ r = \|{\boldsymbol R}\|_F \]" 
src="form_237.png"/>
 </p>
-<p> where <img class="formulaInl" alt="$ \|\cdot\|_F $" src="form_237.png"/> 
is the Frobenius norm. The relative residual norm <img class="formulaInl" 
alt="$ r' $" src="form_238.png"/> is </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ r' = \frac{ \|{\boldsymbol R}\|_F 
}{\|{\boldsymbol X}\|_F } \]" src="form_239.png"/>
+<p> where <img class="formulaInl" alt="$ \|\cdot\|_F $" src="form_238.png"/> 
is the Frobenius norm. The relative residual norm <img class="formulaInl" 
alt="$ r' $" src="form_239.png"/> is </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ r' = \frac{ \|{\boldsymbol R}\|_F 
}{\|{\boldsymbol X}\|_F } \]" src="form_240.png"/>
 </p>
 <p><a class="anchor" id="related"></a></p><dl class="section user"><dt>Related 
Topics</dt><dd>File <a class="el" href="pca__project_8sql__in.html" 
title="Principal Component Analysis Projection. ">pca_project.sql_in</a> 
documenting the SQL functions</dd></dl>
 <p><a class="el" href="group__grp__pca__train.html">Principal Component 
Analysis</a> </p>
@@ -281,7 +281,7 @@ SELECT * FROM result_summary_table;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__pca__train.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__pca__train.html 
b/docs/latest/group__grp__pca__train.html
index 94d6ce6..7853e57 100644
--- a/docs/latest/group__grp__pca__train.html
+++ b/docs/latest/group__grp__pca__train.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net";><img alt="Logo" 
src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -152,7 +152,7 @@ pca_sparse_train( source_table,
 </pre></dd></dl>
 <p><b>Arguments</b> </p><dl class="arglist">
 <dt>source_table </dt>
-<dd><p class="startdd">TEXT. Name of the input table containing the data for 
PCA training. The input data matrix should have <img class="formulaInl" alt="$ 
N $" src="form_218.png"/> rows and <img class="formulaInl" alt="$ M $" 
src="form_174.png"/> columns, where <img class="formulaInl" alt="$ N $" 
src="form_218.png"/> is the number of data points, and <img class="formulaInl" 
alt="$ M $" src="form_174.png"/> is the number of features for each data 
point.</p>
+<dd><p class="startdd">TEXT. Name of the input table containing the data for 
PCA training. The input data matrix should have <img class="formulaInl" alt="$ 
N $" src="form_219.png"/> rows and <img class="formulaInl" alt="$ M $" 
src="form_174.png"/> columns, where <img class="formulaInl" alt="$ N $" 
src="form_219.png"/> is the number of data points, and <img class="formulaInl" 
alt="$ M $" src="form_174.png"/> is the number of features for each data 
point.</p>
 <p>A dense input table is expected to be in the one of the two standard MADlib 
dense matrix formats, and a sparse input table should be in the standard MADlib 
sparse matrix format.</p>
 <p>The two standard MADlib dense matrix formats are </p><pre>{TABLE|VIEW} 
<em>source_table</em> (
     <em>row_id</em> INTEGER,
@@ -307,14 +307,14 @@ SELECT * FROM result_table;
 <li>If both 'lanczos_iter' and proportion of variance (via the 
'components_param' parameter) are defined, 'lanczos_iter' will take precedence 
in determining the number of principal components (i.e. the number of principal 
components will not be greater than 'lanczos_iter' even if the target 
proportion had not been reached).</li>
 </ul>
 <p><a class="anchor" id="background_pca"></a></p><dl class="section 
user"><dt>Technical Background</dt><dd></dd></dl>
-<p>The PCA implemented here uses an SVD decomposition implementation to 
recover the principal components (as opposed to the directly computing the 
eigenvectors of the covariance matrix). Let <img class="formulaInl" alt="$ 
\boldsymbol X $" src="form_219.png"/> be the data matrix, and let <img 
class="formulaInl" alt="$ \hat{x} $" src="form_220.png"/> be a vector of the 
column averages of <img class="formulaInl" alt="$ \boldsymbol{X}$" 
src="form_221.png"/>. PCA computes the matrix <img class="formulaInl" alt="$ 
\hat{\boldsymbol X} $" src="form_222.png"/> as </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ \hat{\boldsymbol X} = {\boldsymbol X} - 
\vec{e} \hat{x}^T \]" src="form_223.png"/>
+<p>The PCA implemented here uses an SVD decomposition implementation to 
recover the principal components (as opposed to the directly computing the 
eigenvectors of the covariance matrix). Let <img class="formulaInl" alt="$ 
\boldsymbol X $" src="form_220.png"/> be the data matrix, and let <img 
class="formulaInl" alt="$ \hat{x} $" src="form_221.png"/> be a vector of the 
column averages of <img class="formulaInl" alt="$ \boldsymbol{X}$" 
src="form_222.png"/>. PCA computes the matrix <img class="formulaInl" alt="$ 
\hat{\boldsymbol X} $" src="form_223.png"/> as </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ \hat{\boldsymbol X} = {\boldsymbol X} - 
\vec{e} \hat{x}^T \]" src="form_224.png"/>
 </p>
-<p> where <img class="formulaInl" alt="$ \vec{e} $" src="form_224.png"/> is 
the vector of all ones.</p>
+<p> where <img class="formulaInl" alt="$ \vec{e} $" src="form_225.png"/> is 
the vector of all ones.</p>
 <p>PCA then computes the SVD matrix factorization </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ \hat{\boldsymbol X} = {\boldsymbol 
U}{\boldsymbol \Sigma}{\boldsymbol V}^T \]" src="form_225.png"/>
+<img class="formulaDsp" alt="\[ \hat{\boldsymbol X} = {\boldsymbol 
U}{\boldsymbol \Sigma}{\boldsymbol V}^T \]" src="form_226.png"/>
 </p>
-<p> where <img class="formulaInl" alt="$ {\boldsymbol \Sigma} $" 
src="form_226.png"/> is a diagonal matrix. The eigenvalues are recovered as the 
entries of <img class="formulaInl" alt="$ {\boldsymbol \Sigma}/(\sqrt{(N-1)} $" 
src="form_546.png"/>, and the principal components are the rows of <img 
class="formulaInl" alt="$ {\boldsymbol V} $" src="form_228.png"/>. The 
reasoning behind using N â 1 instead of N to calculate the covariance is <a 
href="https://en.wikipedia.org/wiki/Bessel%27s_correction";>Bessel's 
correction</a>.</p>
+<p> where <img class="formulaInl" alt="$ {\boldsymbol \Sigma} $" 
src="form_227.png"/> is a diagonal matrix. The eigenvalues are recovered as the 
entries of <img class="formulaInl" alt="$ {\boldsymbol \Sigma}/(\sqrt{(N-1)} $" 
src="form_228.png"/>, and the principal components are the rows of <img 
class="formulaInl" alt="$ {\boldsymbol V} $" src="form_229.png"/>. The 
reasoning behind using N â 1 instead of N to calculate the covariance is <a 
href="https://en.wikipedia.org/wiki/Bessel%27s_correction";>Bessel's 
correction</a>.</p>
 <p>It is important to note that the PCA implementation assumes that the user 
will use only the principal components that have non-zero eigenvalues. The SVD 
calculation is done with the Lanczos method, with does not guarantee 
correctness for singular vectors with zero-valued eigenvalues. Consequently, 
principal components with zero-valued eigenvalues are not guaranteed to be 
correct. Generally, this will not be problem unless the user wants to use the 
principal components for the entire eigenspectrum.</p>
 <p><a class="anchor" id="literature"></a></p><dl class="section 
user"><dt>Literature</dt><dd></dd></dl>
 <p>[1] Principal Component Analysis. <a 
href="http://en.wikipedia.org/wiki/Principal_component_analysis";>http://en.wikipedia.org/wiki/Principal_component_analysis</a></p>
@@ -327,7 +327,7 @@ SELECT * FROM result_table;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html";>
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

[45/51] [partial] incubator-madlib-site git commit: Update doc for 1.9.1 release

Reply via email to