How is the default grid of alphas and L1 ratios chosen for
scikit-learn's enet_cv, and what is the reasoning behind it? What other
approaches exist for choosing this parameter grid, and what are they
based on?
I'm using elastic net to calculate regularized canonical correlation.
Given data matrices X and Y, I find coefficient vectors a and b that
maximize the correlation between Xa and Yb. This can be done by
iteratively regressing X on Yb (to estimate a) and then Y on Xa (to
estimate b), and repeating these two regressions until convergence.
This iterative approach means that I have to do the model selection a
level up from the regression (i.e. I can't use enet_cv or the like
directly). I know I can choose from grids of parameters by
cross-validation or permutation. But I am unsure about how to
intelligently choose the sets of alpha and L1-ratio parameters to try.
And since the parameters can be different for the two regressions, this
quadratically increases the number of parameter combinations to try, so
I need to choose the grid carefully.
Some ideas I've had:
* Perhaps the ratio of samples to features can rule out certain
regularization parameter values, i.e. if there are many more samples
than features, too weak of regularization would be inappropriate.
Has this been formalized mathematically? Wouldn't it depend on how
strong the signal is, too?
* If the solution with a particular regularization strength is a
vector of zeros (i.e. the regularization was too strong), then I can
discard all stronger regularization parameters. This is obvious with
only an L1 penalty; if alpha=0.1 is too strong, then alpha=0.5 will
definitely also be too strong. I wonder about this in the case of
elastic net. That is, if (alpha=0.1, l1_ratio=0.5) is too strong,
does that mean (alpha=0.1, l1_ratio=0.9) will necessarily be too strong?
* And perhaps I could start with a coarse grid and then try again with
more detail in a promising section of it. Any ideas on the best way
of doing this?
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general