Le 20 mars 2012 22:38, David Warde-Farley <[email protected]> a écrit : > On Tue, Mar 20, 2012 at 10:16:22PM +0100, Olivier Grisel wrote: >> Le 20 mars 2012 22:06, David Warde-Farley <[email protected]> a >> écrit : >> > On Tue, Mar 20, 2012 at 09:05:01PM +0100, David Marek wrote: >> > >> >> I found loss functions in sgd_fast.pyx. Shouldn't they be used? >> > >> > SGD is a minimization strategy, independent of any particular loss >> > function. >> > The hinge loss and log loss are implemented but other losses are possible, >> > e.g. multiclass cross-entropy is very popular in the neural networks >> > literature (moreso than one-vs-all or one-vs-one hinge loss), squared error >> > or absolute error for regression tasks, etc. >> >> Hi David, >> >> We would indeed need a multiclass cross-entropy loss function for a >> MLP impl but it would also be useful for linear models to naturally >> train mutliclass linear models without one vs all. >> >> About optimizers for ANNs, do you know if Polyak-Ruppert averaging is >> useful in practice for SGD optimizers on non-linear models such as the >> feed forward MLP or autoencoders? > > AFAIK, yes. Yann LeCun's group won the optimization challenge at > the "Challenges in Hierarchical Models" workshop at NIPS using Polyak- > averaged SGD on a deep autoencoder. > > http://cs.nyu.edu/~zsx/nips2011/ > > IIRC the theory behind Polyak averaging is pretty robust to the choice of > model and loss function.
Thanks this is very interesting. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
