I've been applying preprocessing.scale() to my data prior to using scikit-learn's elastic net, with the understanding that elastic net will not work correctly if the features do not each have zero mean and unit variance. scale() both centers and normalizes the data. ElasticNet has an option to normalize the input data but does not mention centering.
From what I gather, the scikit-learn implementation of elastic net is patterned after the R package glmnet. I was browsing the paper "Regularization Paths for Generalized Linear Models via Coordinate Descent" and came across this paragraph about how glmnet exploits sparsity: Coordinate descent is ideally set up to exploit such sparsity, in an obvious way. The O(N ) inner-product operations in either the naive or covariance updates can exploit the sparsity, by summing over only the non-zero entries. Note that in this case scaling of the variables will not alter the sparsity, but centering will. So scaling is performed up front, but the centering is incorporated in the algorithm in an efficient and obvious manner. Is this also true of the way scikit's elastic net works? That is to say, do I not need to center the data, because it is performed internally in a way that still allows the algorithm to exploit sparsity? If so, it would be fortunate for me, because my data is sparse. I had given up on exploiting the sparsity because of the need to center the data. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
