Le 20 mars 2012 22:38, David Warde-Farley <[email protected]> a écrit :
> On Tue, Mar 20, 2012 at 10:16:22PM +0100, Olivier Grisel wrote:
>> Le 20 mars 2012 22:06, David Warde-Farley <[email protected]> a 
>> écrit :
>> > On Tue, Mar 20, 2012 at 09:05:01PM +0100, David Marek wrote:
>> >
>> >> I found loss functions in sgd_fast.pyx. Shouldn't they be used?
>> >
>> > SGD is a minimization strategy, independent of any particular loss 
>> > function.
>> > The hinge loss and log loss are implemented but other losses are possible,
>> > e.g. multiclass cross-entropy is very popular in the neural networks
>> > literature (moreso than one-vs-all or one-vs-one hinge loss), squared error
>> > or absolute error for regression tasks, etc.
>>
>> Hi David,
>>
>> We would indeed need a multiclass cross-entropy loss function for a
>> MLP impl but it would also be useful for linear models to naturally
>> train mutliclass linear models without one vs all.
>>
>> About optimizers for ANNs, do you know if Polyak-Ruppert averaging is
>> useful in practice for SGD optimizers on non-linear models such as the
>> feed forward MLP or autoencoders?
>
> AFAIK, yes. Yann LeCun's group won the optimization challenge at
> the "Challenges in Hierarchical Models" workshop at NIPS using Polyak-
> averaged SGD on a deep autoencoder.
>
> http://cs.nyu.edu/~zsx/nips2011/
>
> IIRC the theory behind Polyak averaging is pretty robust to the choice of
> model and loss function.

Thanks this is very interesting.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to