On Wed, Oct 12, 2011 at 02:56, Mathieu Blondel <math...@mblondel.org> wrote:
> On Wed, Oct 12, 2011 at 2:52 PM, Peter Prettenhofer
> <peter.prettenho...@gmail.com> wrote:
>
>> The results in [Xu 2011] are pretty impressive given the simplicity of
>> the algorithm - we should definitely give it a try. Unfortunately, the
>> algorithm shares some of the undesirable properties of SGD: you need a
>> number of heuristics to make it work (e.g. learning rate schedule,
>> averaging start point t_0)
>
> Indeed, averaging has been used for ages in the Perceptron community.
> CRFsuite has been supporting averaging for quite some time too I
> think. ASGD's results look indeed impressive, though.

Does anyone know how to implement parameter averaging without touching
every feature at every iteration? With things like CRFs you easily
have millions of features, only a few hundred active per example, so
it's a pain to touch everything all the time. In the page he mentions
that

    Both the stochastic gradient weights and the averaged weights are
    represented using a linear transformation that yields efficiency gains
    for sparse training data.

Does anyone know what format this is?

-- 
 - Alexandre

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to