On Wed, Oct 12, 2011 at 07:38, Peter Prettenhofer <peter.prettenho...@gmail.com> wrote: > For the Averaged Perceptron you would usually do the following (given > that you know the number of iterations, i.e. number of training > examples times the number of epochs, in advance): > > when you update the j-th weight in the t-th iteration, w_j^{(t)}, you > know that this update will ***remain in each weight vector for the > next T - t iterations *** (where T is the total number of iterations > to perform). So you can keep track of the average weight vector > \bar{w} as you go; each time you update w_j with some constant v you > update \bar{w}_j with (T-t) * v . Finally, you divide each non-zero > value in \bar{w} by T. > > So, for the Averaged Perceptron the updates of the averaged weight > vector are still sparse. > > The major difference between Averaged Perceptron and ASGD is that you > have to deal with regularization, which boils down to scaling the > weight vector at each iteration by some constant. This might be the > fact that a straight forward approach such as the one above might > break... I have to look into that more carefully...
Thanks for the explanation, this really helped! About your remark on regularization, as long as you're only doing l2 regularization it's the same thing as always multiplying the weight vector by a constant depending on your learning rate and your regularization constant at each iteration (i.e., w -= lr * lambda * w is the same as w = (1 - lr*lambda)*w). Since it's all multiplicative factors you can use this to keep an unregularized w lying around and scaling the updates to keep the relative magnitudes between the old components of w, which should have decreased, and the newly updated components of w, the same (i.e., learning rate at t+1 is equal to (1/(1-lr*lambda)) * lr, where lr is the learning rate at t). So you can probably fold this into your equation above as a different learning rate and everything should work. -- - Alexandre ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general