Hi

> I'd emphasize that "SGD" is a class of algorithms, and the implementations
> that exist are purely for the linear classifier setting. I'm not sure how
> much use they will be in an SGD-for-MLP (they can maybe be reused for certain
> kinds of output layers), but there is definitely more work in efficiently
> computing the gradient.

You are right. I have been using "sgd" where I think I should have
said "efficient backprop". I am not sure how I should call the
existing code, is "efficient backpropagation" wrong? As in "it's
stochastic backpropagation".

> I'm unsure, but if you're as familiar as you say with backpropagation, this
> doesn't seem like that much actual code for a 2.5 month stretch you've
> projected.

I agree, I wasn't sure which algorithms should be implemented as I
don't have much real-life experience.

> If possible, I wouldn't limit yourself to vanilla SGD as the only avenue for
> optimization.  For small problems/model sizes, other avenues are worth
> exploring, e.g.
>
> - batch gradient descent with delta-bar-delta adaptation
>  ( http://www.bcs.rochester.edu/people/robbie/jacobs.nn88.pdf )
>  once you have the gradient formula taken care of, this is a few
>  relatively simple lines of NumPy.
> - miscellaneous numerical optimizers in scipy.optimize, in particular
>  the "minimize" function which provides a unified interface to all the
>  different optimization strategies.

In NN course I have attended, we talked about these first order
algorithms: Silva & Almeida, Delta-bar-delta, Super SAB. d-b-d
shouldn't be hard to implement.

> In addition, beyond basic SGD, make sure you *at least* implement support for
> a momentum term; this can help enormously with rapidly traversing and
> escaping plateaus in the error surface, and is trivial to implement once
> you are already computing the gradient. Polyak averaging may be another
> useful avenue given any spare time.

Momentum should definitely be supported. I haven't heard about Polyak
averaging. Could you, please, point me to an article about it? I have
found Acceleration of Stochastic Approximation by Averaging, B.T.
Polyak and A.B. Juditsky, is it the right one?

> I presume when you mention "Levenberg-Marquardt" you mean the
> stochastic-diagonal version referenced in the "Efficient Backprop" paper?
> This is very different than regular Levenberg-Marquardt and you should
> include this distinction.

I have looked at the Levenber-Marquardt in "Efficient Backprop", the
one in my course notes and the one used in matlab and it seems they
are all the same (or very similar). So, I guess I really meant the
stochastic-diagonal version. Haven't knew there is another one. I'll
specify it.

> Other comments:
> - Ideally, some amount of testing should be done in parallel with
>  development. You will inevitably be ad-hoc testing your implementation
>  as you go, don't throw that code away but put it in a unit test.

Right, but I thought there should be a period of time when I'll take
all the tests I have created, polish them and put them together. I
will make it shorter because I will write some tests and documentation
during the implementation phase and I will need more time for that
part because I will implement more algorithms. I will have to specify
the time schedule more precisely if I'll implement more algorithms.

> - I'd like to see more than simply "add tests". Specifically, you should give
>  some thought into how exactly you are going to go about unit-testing your
>  implementation. This will require some careful thought about the nature of
>  MLPs themselves, and how to go about verifying correctness. Regression
>  tests (e.g. making sure you get the same output as before given the same
>  random seed) are good for catching bugs introduced by refactoring, but they
>  are not the whole story.

I will try to look at other neural networks libraries, how they are
testing their code. So far, I can say FANN doesn't have many tests.

> Otherwise, it seems like a good proposal. As I said, it seems like a rather
> small amount of actual implementation, even if you are only budgeting the
> first half of the work period. I would look for some additional features to
> flesh out the implementation side of the proposal.

Ok, how about implementing more learning algorithms:
* efficient backprop -- almost done
* Delta-bar-delta -- batch learning, shouldn't be that hard
* resilient backprop -- to me it looks similar to DBD, but it should
be one of the fastest algorithms so it would be nice to benchmark it.
* stochastic diagonal Levenberq-Marquardt -- second order algorithm

Thanks for your comments

Dave (trying to distinguish our names :-)

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to