I'm going to send in basically the textbook description of SVD, in case someone wants to implement it. If you guys get your householder reductions, then that's the most difficult part. I have code for all of this, but it is held up by legal, so I'll just give you guys the basic ideas instead.
Forgive me if this seems like condescension, you can find this in any decent math book, and it is about one of the most well understood algorithms out there. However, since you're so concerned with clean room implementations, here I'll describe one from first principals, so you can be confident that its clean if someone implements it. Here's the idea. Lets assume the matrix A is n by m, and A` would be A-transpose. n >= m, in this case, so A`A (m by m) is smaller than AA` (n by n). For a matrix A, it can be broken down as A = USV`, where U and V are orthogonal, and S is diagonal. Let look at eigenvectors for a moment. Eigenvector is a vector of the form Ab = ub, for some scalar (eigenvalue) u. This means that given an Eigenvector matrix D (which is orthogonal), where the eigenvectors make up the columns of D, this eigenvector equation can be written as AD = ZD, where Z is the diagonal matrix of the eigenvalues. Rearranging, D`AD = Z, the eigenvector matrices will diagonalize the matrix A. Much talk about the number of eigenvectors and their orthogonality can be found almost anywhere. Now imagine A`A (A-transpose multiplied by A, a symmetrical matrix). Given that A = USV`, A`A = VSU`USV` = VSSV`. SS, is really just a diagonal matrix of values, lets call it (suggestively) Z. A`A = VZV`, so if we can find the eigenvectors of A`A (which is symmetric), and the eigenvalues Z, then we can compute the elemenst of S (taking the square root of each element of Z). Given that we have V, and S, we can compute AV(S-inverse) = U, and get the other part of the decomposition. So, how do we find the eigenvectors? Easily enough. 1) Start by householder transforming a symmetric matrix into tridiagonal form by multiplying by householder matrixes H, like so A2 = H1A1H1`. This will reduce to tridiagonal form after N-2 such multiplications, producing Q1` = H1`H2`H3`.... 2) Reduce to diagonal form using jacoby reductions with Givens plane rotations. Because of the simple form, literally mulitply it out by hand, and you'll see that only a few elements of A change wiith each reduction, and the formulas for the new values there are quite easy. Use this simplified code to reduce A. Each step here produces a Givens Plane Rotation matrix G, such that A2 = G1A1G1`. Again, Q2 = G1`G2`G3`.... These multiplications are easy, as only two rows and two columns of Q2 change with each new plane rotation. This means that Z = Q2Q1AQ1`Q2`, or that V = Q2Q1. A has been diagonalized when this process is finished. 3) Now that we have Z and V, the rest is just two matrix multiplications away. Once again, the givens reductions are really trivial. remember, a givens matrix is like this..... 1 0 0 0 0 r s 0 0 -s r 0 0 0 0 1 Make sure that the box of values ends up at the correct position to wipe out one of the off diagonal elements (the one in the position of -s, here element 3, 2). It's easy to compute which s and r are needed to eliminate the element, do it by hand and put the equation into the code. Do the multiplication by hand, and put those equations into the code as well. G1A1G1` can be multiplied out by hand, and only a half a dozen elemnts of A change, A is symmetrical, so you only have to consider half of those, something like 4 elements. Really easy. The off-diagonal elements will come back each time. compute a new matrix and wipe them out again. They will be smaller each time, and soon you'll reduce them down to the level where you can just throw them away. Perhaps when the off diagonal element divided by the diagonal is less than 10^-14 or so. Then make a new matrix and get the next elements. The next matrix after the one above would be this one. 1 0 0 0 0 1 0 0 0 0 r s 0 0 -s r And so on, get them one at a time. Ok, there you go. The complete algorithm. Efficient householder reductions, and this algorithm will be efficient.