Hey James.
There is an implementation of CCA in scikit-learn:
http://scikit-learn.org/dev/modules/cross_decomposition.html

Afaik it is not in a very great shape, though, and if you want improve it,
that would be greatly appreciated.

Cheers,
Andy

On 08/12/2013 10:59 PM, James Jensen wrote:
Hello!

You may already be familiar with canonical correlation analysis (CCA). Given two sets of variables, CCA yields the linear combinations with maximum correlation between them. It is similar to PCA, which finds projections with maximum variance for a single set of variables; in fact, PCA can be treated as a special case of CCA. Sigg et al. 2007 (http://ml2.inf.ethz.ch/papers/2007/0_nonnegative_cca.pdf) discuss how canonical correlation can be solved for by iterative regression. This provides a straightforward and flexible way to apply regularization penalties and useful constraints such as non-negativity in CCA. I think CCA would be useful to have in Scikit-learn and that it would be relatively easy to write a CCA solver using Scikit-learn's existing linear methods. I have wanted to contribute something myself along these lines but have met with some difficulties.

CCA involves finding multiple directions that are orthogonal to each other, much like PCA. For all directions beyond the first, the objective function to be minimized includes not only the regression and regularization penalty terms, but also a penalty intended to enforce the orthogonality of the solution, a term that is zero within the feasible space.

Here is the objective function:

For the kth direction a_k,

a_k=f=\underset{a}{min\,} { \frac{1}{2n_{samples}} ||X a - Yb||_{2} ^ {2} + \alpha \rho ||a||_1 + \frac{\alpha(1-\rho)}{2} ||a||_{2} ^ {2}} + \lambda a^{\top}Oa

which is the elastic net plus the orthogonality penalty term O = \sum_{l < k} {C_{XX} a_{l} a_{l}^{\top} C_{XX}}

and C_{XX} is the covariance matrix of X.

Clearly I can't solve for this using Scikit-learn's elastic net as-is. I thought perhaps I could find a way to use the same underlying coordinate descent code to optimize this new function. But I have not been able to make sense of it. Could anyone give me some pointers? Of course, I would be delighted if any of the experienced developers were interested in implementing this themselves, since I am a novice, but I realize that they have many things to work on already.

Note: since appropriate regularization penalties might not be known beforehand, and different regularizations can be applied for each of the two sets of variables, it would be nice to also enable the use of cross-validation for model selection as in ElasticNetCV. But I should probably figure out the simpler case first.

I am also not confident about the details of how to arrive at a good value for \lambda. I understand that the basic idea is to begin with a very small value, so that we're close to an unconstrained solution to the problem (i.e. without the penalty term affecting the solution), and increase \lambda (using the previous solution as our initial guess in each iteration) until the solution no longer changes, i.e. we've converged to a solution that's within the feasible space. But how do I know how much to increase \lambda by at each iteration?

I'm curious about the motivation for using coordinate descent, and about any other methods that can be used for this sort of thing. From what I understand, the advantage of coordinate descent here is that the coefficients for the entire path of the regularization parameter can be found at very little extra cost. Aside from that, isn't it slower than gradient-based methods (i.e. takes more iterations to converge)? Although I guess in this case the derivative of the L1 norm is not defined where any element of the vector is zero, so perhaps gradient-based methods are out of the question? Are there other methods that can be used for elastic net that would accommodate the orthogonality penalty term? Could LARS be adapted to accommodate it? I see that Sigg et al. use monotone incremental forward stagewise regression (MIFSR), but they do it with an L1 penalty only, and I suspect that monotonicity would not hold for the elastic net penalty. If that's incorrect, let me know.

And I've been told there are probably more sophisticated methods for dealing with this kind of orthogonality constraint than this penalty method, but I don't know what these methods are. If anyone is familiar with this and could direct me to a good resource, I would appreciate it.

Sorry for the long email with many questions. Thanks for your help and for an extremely useful library. And congrats on version 0.14.

James



------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to