I've been applying preprocessing.scale() to my data prior to using 
scikit-learn's elastic net, with the understanding that elastic net will 
not work correctly if the features do not each have zero mean and unit 
variance. scale() both centers and normalizes the data. ElasticNet has 
an option to normalize the input data but does not mention centering.

 From what I gather, the scikit-learn implementation of elastic net is 
patterned after the R package glmnet. I was browsing the paper 
"Regularization Paths for Generalized Linear Models via Coordinate 
Descent" and came across this paragraph about how glmnet exploits sparsity:

Coordinate descent is ideally set up to exploit such sparsity, in an obvious
way. The O(N ) inner-product operations in either the naive or covariance
updates can exploit the sparsity, by summing over only the non-zero entries.
Note that in this case scaling of the variables will not alter the sparsity,
but centering will. So scaling is performed up front, but the centering is
incorporated in the algorithm in an efficient and obvious manner.

Is this also true of the way scikit's elastic net works? That is to say, 
do I not need to center the data, because it is performed internally in 
a way that still allows the algorithm to exploit sparsity?

If so, it would be fortunate for me, because my data is sparse. I had 
given up on exploiting the sparsity because of the need to center the data.

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to