For the benefit of people on the Racket Users list, we're discussing the Conjugate Gradient (CG) method applied to Least Squares problems, as described in the original paper by Hestenes and Stiefel:

http://www.math.purdue.edu/~lucier/jresv49n6p409_A1b.pdf

The formulas to use are (10:2) on page 16 of the file (page 424 in the original). (I get only the Racket Users digest, and that information wasn't included there.)

On 12/02/2016 01:52 AM, John Clements wrote:

Okay, I’ve taken a crack at implementing this… well, I have a toy 
implementation, that doesn’t do any checking and only works for a particular 
matrix shape.

Short version: yep, it works!

Now, a few questions.

1) What should one use as the initial estimate, x_0 ? The method appears to 
work fine for an initial estimate that is uniformly zero. Is there any reason 
not to use this?

x_0=0 works fine.

2) When does one stop?

In exact arithmetic, the algorithm stops when $A^*r_k$, where $r_k$ is the residual at the $k$th step, is the zero vector.

In exact arithmetic, this is guaranteed to happen when $k$ is less than or equal to the number of columns of $A$.

In floating-point arithmetic, it is not guaranteed that $A^*r_k$ is ever the zero vector.

I have not read the paper carefully, but it appears that it’s intended to 
“halt” in approximately ’n’ steps, where ’n’ is the … number of rows?

The number of columns.

In the one case I tried, my “direction” vector p_i quickly dropped to something 
on the order of [1e-16, 1e-16], and in fact it became perfectly stable, in that 
successive iterations produced precisely the same values for the three 
iteration variables. However, it wouldn’t surprise me to discover that there 
were cases on which the answer oscillated between two minutely different 
values. Can you shed any light on when it’s safe to stop in the general case?

If the columns of $A$ are normalized to all have size about 1, then I would stop when $\|A^*r_k\|$ is about round-off size, or when $k$ hits the number of columns, whichever is earlier.

In statistical data analysis, many people compute only a few steps, think of the the search direction vectors $p_k$ as "factors" and derive some information from this. With only one RHS vector $b$ this is the same as Partial Least Squares (PLS).

3) Do you have any idea how this technique compares to any described in 
Numerical Recipes or implemented in LAPACK ?

I'm not an expert in numerical linear algebra or least squares problems. It appears that LAPACK uses a so-called $QR$ factorization of $A$ to solve linear least squares problems. I don't know how to compare them.

Matlab probably has its own methods.

There is an iterative method called LSQR described here:

https://web.stanford.edu/class/cme324/paige-saunders2.pdf

It is in some sense a more stable version of CG (in exact arithmetic they generate the same sequence of approximations $x_k$), and it is compared to CG in section 7 of that paper.

I'm not saying that CG is a particularly good method to solve this problem, just that forming and solving $A^* Ax=A^* b$ is a particularly *bad* method.

Brad

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to