On Wed, Oct 12, 2011 at 02:56, Mathieu Blondel <math...@mblondel.org> wrote: > On Wed, Oct 12, 2011 at 2:52 PM, Peter Prettenhofer > <peter.prettenho...@gmail.com> wrote: > >> The results in [Xu 2011] are pretty impressive given the simplicity of >> the algorithm - we should definitely give it a try. Unfortunately, the >> algorithm shares some of the undesirable properties of SGD: you need a >> number of heuristics to make it work (e.g. learning rate schedule, >> averaging start point t_0) > > Indeed, averaging has been used for ages in the Perceptron community. > CRFsuite has been supporting averaging for quite some time too I > think. ASGD's results look indeed impressive, though.
Does anyone know how to implement parameter averaging without touching every feature at every iteration? With things like CRFs you easily have millions of features, only a few hundred active per example, so it's a pain to touch everything all the time. In the page he mentions that Both the stochastic gradient weights and the averaged weights are represented using a linear transformation that yields efficiency gains for sparse training data. Does anyone know what format this is? -- - Alexandre ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general