Thanks all for working on solving this issue. Here are other related questions in light of Gael's email:
As far as I understand [1], alpha-based regularisation in the l2 regularized SGD models is scaled by n_samples (SGD models, logistic regression, elastic net...): it this a bug or not? [1] http://scikit-learn.org/dev/modules/sgd.html#mathematical-formulation The loss and regularization term both grow with n_samples hence, alpha in l2 regularized SGD models seems to be equivalent to (n_samples / C) of the SVM formulation. In RidgeRegression, alpha is 1 / (2 * C) according to http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge hence I assumed unscaled as expected. What about the alpha in elastic net models (coordinate descent and SGD) where the penalty is term is `alpha * (rho * l2 + l1)`. Should this be scaled or not? Also another way to circumvent the n_samples change issue when doing CV-based model selection of sparse models might be to use the Bootstrap (sampling with replacement) and make the training size of the folds artificially fixed to a the total training set (by having redundant samples): I wonder if this is a good idea or not (having the same sample show up several times in the training set might be a bad idea). -- Olivier ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
