Sorry about the late reply; I was practically all of April on holidays and now I find myself occupied by final exams.
+1 - see below. The only real question here is do we need a subpackage for matrix decompositions. Since I think it is unlikely that we will have more than a handful of these, I am OK putting these into the top level, i.e. in .linear.
That seems fine to me.
> I also had a look at Jama yesterday. There they defer the explicit > generation of the Q part of the decomposition until the user calls > getQ(), which I guess has a computational advantage over calculating > the whole decomp if the user of the API only needs R. This of course > implies that the algorithm has a state and it's most natural to > implement it as a class of its own. Again, I think this should be a separate (immutable) class with state, with the decomp done in the constructor, which should take a RealMatrix (not impl) as argument (using getData to copy if argument is not a RealMatrixImp). I am not sure I understand what you mean about the Q and R accessors in Jama. It looks to me like they are just doing transformations to provide Q and R separately. I think it makes sense to provide those accessors (as we should in the LU case when we externalize that).
The algorithm used there produces the matrix R and an array of Householder vectors. When the getQ() is called, the Householder vectors are made into matrices that are multiplied together to yield the Q matrix. This seems to be the best way to go about things.
> > From the release plan I read that the QR-decomposition will be needed > for linear regression. Does that mean that it will be used mainly for > least-squares fitting? In that case both Q and R are needed most of > the time, so having the algorithm in a separate class is not strictly > necessary.. The immediate motivation is for solving the normal equations. I don't think we should include the solve() method that Jama has in this class, though. I think it is more natural to have that in the OLS implementation. Tests are a good start. Returning to the overall API design, I think it makes sense to follow the abstract factory pattern used elsewhere in [math] (e.g. the distributions package) to provide for pluggable decomp implementations with defaults provided. So what we would end up with would be an abstract DecompositionFactory class with a concrete DecompositionFactoryImpl subclass providing default implementations. Interfaces for decompositions would be abstracted. User code with look like this: QRDecomposition qr = DecompositionFactory.newInstance().createQRDecomposition(matrix); where QRDecomposition is the interface and DecompositionFactory.newInstance() returns a DecompositionFactoryImpl and createQRDecomposition(matrix) invokes the constructor for QRDecompositionImpl, which is the default implementation. This setup is used in the distributions and analysis packages to provide pluggable implementations.
Seems good to me. What do you think is better as a method name, "createQRDecomposition" or "newQRDecomposition"? Both styles seem to be in use. I suppose we won't have a base interface for matrix decompositions?
To get started, we can just define QRDecomposition, QRDecompositionImpl. If there are no objections / better ideas, we can then add the factory impls and do the same for LU decomp (and Cholesky, which I think we may also have laying around somewhere).
Alright, I'm on it. Joni