[racket-users] Re: Methods for least squares

Bradley Lucier Fri, 02 Dec 2016 12:04:27 -0800

For the benefit of people on the Racket Users list, we're discussing theConjugate Gradient (CG) method applied to Least Squares problems, asdescribed in the original paper by Hestenes and Stiefel:


http://www.math.purdue.edu/~lucier/jresv49n6p409_A1b.pdf

The formulas to use are (10:2) on page 16 of the file (page 424 in theoriginal). (I get only the Racket Users digest, and that informationwasn't included there.)


On 12/02/2016 01:52 AM, John Clements wrote:

Okay, I’ve taken a crack at implementing this… well, I have a toy 
implementation, that doesn’t do any checking and only works for a particular 
matrix shape.

Short version: yep, it works!

Now, a few questions.

1) What should one use as the initial estimate, x_0 ? The method appears to 
work fine for an initial estimate that is uniformly zero. Is there any reason 
not to use this?


x_0=0 works fine.

2) When does one stop?

In exact arithmetic, the algorithm stops when $A^*r_k$, where $r_k$ isthe residual at the $k$th step, is the zero vector.

In exact arithmetic, this is guaranteed to happen when $k$ is less thanor equal to the number of columns of $A$.

In floating-point arithmetic, it is not guaranteed that $A^*r_k$ is everthe zero vector.

I have not read the paper carefully, but it appears that it’s intended to 
“halt” in approximately ’n’ steps, where ’n’ is the … number of rows?


The number of columns.

In the one case I tried, my “direction” vector p_i quickly dropped to something 
on the order of [1e-16, 1e-16], and in fact it became perfectly stable, in that 
successive iterations produced precisely the same values for the three 
iteration variables. However, it wouldn’t surprise me to discover that there 
were cases on which the answer oscillated between two minutely different 
values. Can you shed any light on when it’s safe to stop in the general case?

If the columns of $A$ are normalized to all have size about 1, then Iwould stop when $\|A^*r_k\|$ is about round-off size, or when $k$ hitsthe number of columns, whichever is earlier.

In statistical data analysis, many people compute only a few steps,think of the the search direction vectors $p_k$ as "factors" and derivesome information from this. With only one RHS vector $b$ this is thesame as Partial Least Squares (PLS).

3) Do you have any idea how this technique compares to any described in 
Numerical Recipes or implemented in LAPACK ?

I'm not an expert in numerical linear algebra or least squares problems.It appears that LAPACK uses a so-called $QR$ factorization of $A$ tosolve linear least squares problems. I don't know how to compare them.


Matlab probably has its own methods.

There is an iterative method called LSQR described here:

https://web.stanford.edu/class/cme324/paige-saunders2.pdf

It is in some sense a more stable version of CG (in exact arithmeticthey generate the same sequence of approximations $x_k$), and it iscompared to CG in section 7 of that paper.

I'm not saying that CG is a particularly good method to solve thisproblem, just that forming and solving $A^* Ax=A^* b$ is a particularly*bad* method.


Brad

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[racket-users] Re: Methods for least squares

Reply via email to