Re: [Scikit-learn-general] Léon Bottou SGD version 2.0 is out: Averaged SGD

Alexandre Passos Wed, 12 Oct 2011 07:38:00 -0700

On Wed, Oct 12, 2011 at 07:38, Peter Prettenhofer
<peter.prettenho...@gmail.com> wrote:
> For the Averaged Perceptron you would usually do the following (given
> that you know the number of iterations, i.e. number of training
> examples times the number of epochs, in advance):
>
> when you update the j-th weight in the t-th iteration, w_j^{(t)}, you
> know that this update will ***remain in each weight vector for the
> next T - t iterations *** (where T is the total number of iterations
> to perform). So you can keep track of the average weight vector
> \bar{w} as you go; each time you update w_j with some constant v you
> update \bar{w}_j with (T-t) * v . Finally, you divide each non-zero
> value in \bar{w} by T.
>
> So, for the Averaged Perceptron the updates of the averaged weight
> vector are still sparse.
>
> The major difference between Averaged Perceptron and ASGD is that you
> have to deal with regularization, which boils down to scaling the
> weight vector at each iteration by some constant. This might be the
> fact that a straight forward approach such as the one above might
> break... I have to look into that more carefully...


Thanks for the explanation, this really helped!

About your remark on regularization, as long as you're only doing l2
regularization it's the same thing as always multiplying the weight
vector by a constant depending on your learning rate and your
regularization constant at each iteration (i.e., w -= lr * lambda * w
is the same as w = (1 - lr*lambda)*w). Since it's all multiplicative
factors you can use this to keep an unregularized w lying around and
scaling the updates to keep the relative magnitudes between the old
components of w, which should have decreased, and the newly updated
components of w, the same (i.e., learning rate at t+1 is equal to
(1/(1-lr*lambda)) * lr, where lr is the learning rate at t). So you
can probably fold this into your equation above as a different
learning rate and everything should work.

-- 
 - Alexandre

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Léon Bottou SGD version 2.0 is out: Averaged SGD

Reply via email to