How is the default grid of alphas and L1 ratios chosen for scikit-learn's enet_cv, and what is the reasoning behind it? What other approaches exist for choosing this parameter grid, and what are they based on?

I'm using elastic net to calculate regularized canonical correlation. Given data matrices X and Y, I find coefficient vectors a and b that maximize the correlation between Xa and Yb. This can be done by iteratively regressing X on Yb (to estimate a) and then Y on Xa (to estimate b), and repeating these two regressions until convergence.

This iterative approach means that I have to do the model selection a level up from the regression (i.e. I can't use enet_cv or the like directly). I know I can choose from grids of parameters by cross-validation or permutation. But I am unsure about how to intelligently choose the sets of alpha and L1-ratio parameters to try. And since the parameters can be different for the two regressions, this quadratically increases the number of parameter combinations to try, so I need to choose the grid carefully.

Some ideas I've had:

 * Perhaps the ratio of samples to features can rule out certain
   regularization parameter values, i.e. if there are many more samples
   than features, too weak of regularization would be inappropriate.
   Has this been formalized mathematically? Wouldn't it depend on how
   strong the signal is, too?
 * If the solution with a particular regularization strength is a
   vector of zeros (i.e. the regularization was too strong), then I can
   discard all stronger regularization parameters. This is obvious with
   only an L1 penalty; if alpha=0.1 is too strong, then alpha=0.5 will
   definitely also be too strong. I wonder about this in the case of
   elastic net. That is, if (alpha=0.1, l1_ratio=0.5) is too strong,
   does that mean (alpha=0.1, l1_ratio=0.9) will necessarily be too strong?
 * And perhaps I could start with a coarse grid and then try again with
   more detail in a promising section of it. Any ideas on the best way
   of doing this?



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to