Hey James.
There is an implementation of CCA in scikit-learn:
http://scikit-learn.org/dev/modules/cross_decomposition.html
Afaik it is not in a very great shape, though, and if you want improve it,
that would be greatly appreciated.
Cheers,
Andy
On 08/12/2013 10:59 PM, James Jensen wrote:
Hello!
You may already be familiar with canonical correlation analysis (CCA).
Given two sets of variables, CCA yields the linear combinations with
maximum correlation between them. It is similar to PCA, which finds
projections with maximum variance for a single set of variables; in
fact, PCA can be treated as a special case of CCA. Sigg et al. 2007
(http://ml2.inf.ethz.ch/papers/2007/0_nonnegative_cca.pdf) discuss how
canonical correlation can be solved for by iterative regression. This
provides a straightforward and flexible way to apply regularization
penalties and useful constraints such as non-negativity in CCA. I
think CCA would be useful to have in Scikit-learn and that it would be
relatively easy to write a CCA solver using Scikit-learn's existing
linear methods. I have wanted to contribute something myself along
these lines but have met with some difficulties.
CCA involves finding multiple directions that are orthogonal to each
other, much like PCA. For all directions beyond the first, the
objective function to be minimized includes not only the regression
and regularization penalty terms, but also a penalty intended to
enforce the orthogonality of the solution, a term that is zero within
the feasible space.
Here is the objective function:
For the kth direction a_k,
a_k=f=\underset{a}{min\,} { \frac{1}{2n_{samples}} ||X a - Yb||_{2} ^
{2} + \alpha \rho ||a||_1 + \frac{\alpha(1-\rho)}{2} ||a||_{2} ^ {2}}
+ \lambda a^{\top}Oa
which is the elastic net plus the orthogonality penalty term O =
\sum_{l < k} {C_{XX} a_{l} a_{l}^{\top} C_{XX}}
and C_{XX} is the covariance matrix of X.
Clearly I can't solve for this using Scikit-learn's elastic net as-is.
I thought perhaps I could find a way to use the same underlying
coordinate descent code to optimize this new function. But I have not
been able to make sense of it. Could anyone give me some pointers? Of
course, I would be delighted if any of the experienced developers were
interested in implementing this themselves, since I am a novice, but I
realize that they have many things to work on already.
Note: since appropriate regularization penalties might not be known
beforehand, and different regularizations can be applied for each of
the two sets of variables, it would be nice to also enable the use of
cross-validation for model selection as in ElasticNetCV. But I should
probably figure out the simpler case first.
I am also not confident about the details of how to arrive at a good
value for \lambda. I understand that the basic idea is to begin with a
very small value, so that we're close to an unconstrained solution to
the problem (i.e. without the penalty term affecting the solution),
and increase \lambda (using the previous solution as our initial guess
in each iteration) until the solution no longer changes, i.e. we've
converged to a solution that's within the feasible space. But how do I
know how much to increase \lambda by at each iteration?
I'm curious about the motivation for using coordinate descent, and
about any other methods that can be used for this sort of thing. From
what I understand, the advantage of coordinate descent here is that
the coefficients for the entire path of the regularization parameter
can be found at very little extra cost. Aside from that, isn't it
slower than gradient-based methods (i.e. takes more iterations to
converge)? Although I guess in this case the derivative of the L1 norm
is not defined where any element of the vector is zero, so perhaps
gradient-based methods are out of the question? Are there other
methods that can be used for elastic net that would accommodate the
orthogonality penalty term? Could LARS be adapted to accommodate it? I
see that Sigg et al. use monotone incremental forward stagewise
regression (MIFSR), but they do it with an L1 penalty only, and I
suspect that monotonicity would not hold for the elastic net penalty.
If that's incorrect, let me know.
And I've been told there are probably more sophisticated methods for
dealing with this kind of orthogonality constraint than this penalty
method, but I don't know what these methods are. If anyone is familiar
with this and could direct me to a good resource, I would appreciate it.
Sorry for the long email with many questions. Thanks for your help and
for an extremely useful library. And congrats on version 0.14.
James
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general