Re: [Scikit-learn-general] canonical correlation using Scikit-learn's generalized linear methods

Andreas Mueller Tue, 13 Aug 2013 02:54:49 -0700

Hey James.
There is an implementation of CCA in scikit-learn:
http://scikit-learn.org/dev/modules/cross_decomposition.html


Afaik it is not in a very great shape, though, and if you want improve it,
that would be greatly appreciated.

Cheers,
Andy

On 08/12/2013 10:59 PM, James Jensen wrote:

Hello!
You may already be familiar with canonical correlation analysis (CCA).Given two sets of variables, CCA yields the linear combinations withmaximum correlation between them. It is similar to PCA, which findsprojections with maximum variance for a single set of variables; infact, PCA can be treated as a special case of CCA. Sigg et al. 2007(http://ml2.inf.ethz.ch/papers/2007/0_nonnegative_cca.pdf) discuss howcanonical correlation can be solved for by iterative regression. Thisprovides a straightforward and flexible way to apply regularizationpenalties and useful constraints such as non-negativity in CCA. Ithink CCA would be useful to have in Scikit-learn and that it would berelatively easy to write a CCA solver using Scikit-learn's existinglinear methods. I have wanted to contribute something myself alongthese lines but have met with some difficulties.
CCA involves finding multiple directions that are orthogonal to eachother, much like PCA. For all directions beyond the first, theobjective function to be minimized includes not only the regressionand regularization penalty terms, but also a penalty intended toenforce the orthogonality of the solution, a term that is zero withinthe feasible space.
Here is the objective function:

For the kth direction a_k,
a_k=f=\underset{a}{min\,} { \frac{1}{2n_{samples}} ||X a - Yb||_{2} ^{2} + \alpha \rho ||a||_1 + \frac{\alpha(1-\rho)}{2} ||a||_{2} ^ {2}}+ \lambda a^{\top}Oa
which is the elastic net plus the orthogonality penalty term O =\sum_{l < k} {C_{XX} a_{l} a_{l}^{\top} C_{XX}}
and C_{XX} is the covariance matrix of X.
Clearly I can't solve for this using Scikit-learn's elastic net as-is.I thought perhaps I could find a way to use the same underlyingcoordinate descent code to optimize this new function. But I have notbeen able to make sense of it. Could anyone give me some pointers? Ofcourse, I would be delighted if any of the experienced developers wereinterested in implementing this themselves, since I am a novice, but Irealize that they have many things to work on already.
Note: since appropriate regularization penalties might not be knownbeforehand, and different regularizations can be applied for each ofthe two sets of variables, it would be nice to also enable the use ofcross-validation for model selection as in ElasticNetCV. But I shouldprobably figure out the simpler case first.
I am also not confident about the details of how to arrive at a goodvalue for \lambda. I understand that the basic idea is to begin with avery small value, so that we're close to an unconstrained solution tothe problem (i.e. without the penalty term affecting the solution),and increase \lambda (using the previous solution as our initial guessin each iteration) until the solution no longer changes, i.e. we'veconverged to a solution that's within the feasible space. But how do Iknow how much to increase \lambda by at each iteration?
I'm curious about the motivation for using coordinate descent, andabout any other methods that can be used for this sort of thing. Fromwhat I understand, the advantage of coordinate descent here is thatthe coefficients for the entire path of the regularization parametercan be found at very little extra cost. Aside from that, isn't itslower than gradient-based methods (i.e. takes more iterations toconverge)? Although I guess in this case the derivative of the L1 normis not defined where any element of the vector is zero, so perhapsgradient-based methods are out of the question? Are there othermethods that can be used for elastic net that would accommodate theorthogonality penalty term? Could LARS be adapted to accommodate it? Isee that Sigg et al. use monotone incremental forward stagewiseregression (MIFSR), but they do it with an L1 penalty only, and Isuspect that monotonicity would not hold for the elastic net penalty.If that's incorrect, let me know.
And I've been told there are probably more sophisticated methods fordealing with this kind of orthogonality constraint than this penaltymethod, but I don't know what these methods are. If anyone is familiarwith this and could direct me to a good resource, I would appreciate it.
Sorry for the long email with many questions. Thanks for your help andfor an extremely useful library. And congrats on version 0.14.
James



------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] canonical correlation using Scikit-learn's generalized linear methods

Reply via email to