Hi > I'd emphasize that "SGD" is a class of algorithms, and the implementations > that exist are purely for the linear classifier setting. I'm not sure how > much use they will be in an SGD-for-MLP (they can maybe be reused for certain > kinds of output layers), but there is definitely more work in efficiently > computing the gradient.
You are right. I have been using "sgd" where I think I should have said "efficient backprop". I am not sure how I should call the existing code, is "efficient backpropagation" wrong? As in "it's stochastic backpropagation". > I'm unsure, but if you're as familiar as you say with backpropagation, this > doesn't seem like that much actual code for a 2.5 month stretch you've > projected. I agree, I wasn't sure which algorithms should be implemented as I don't have much real-life experience. > If possible, I wouldn't limit yourself to vanilla SGD as the only avenue for > optimization. For small problems/model sizes, other avenues are worth > exploring, e.g. > > - batch gradient descent with delta-bar-delta adaptation > ( http://www.bcs.rochester.edu/people/robbie/jacobs.nn88.pdf ) > once you have the gradient formula taken care of, this is a few > relatively simple lines of NumPy. > - miscellaneous numerical optimizers in scipy.optimize, in particular > the "minimize" function which provides a unified interface to all the > different optimization strategies. In NN course I have attended, we talked about these first order algorithms: Silva & Almeida, Delta-bar-delta, Super SAB. d-b-d shouldn't be hard to implement. > In addition, beyond basic SGD, make sure you *at least* implement support for > a momentum term; this can help enormously with rapidly traversing and > escaping plateaus in the error surface, and is trivial to implement once > you are already computing the gradient. Polyak averaging may be another > useful avenue given any spare time. Momentum should definitely be supported. I haven't heard about Polyak averaging. Could you, please, point me to an article about it? I have found Acceleration of Stochastic Approximation by Averaging, B.T. Polyak and A.B. Juditsky, is it the right one? > I presume when you mention "Levenberg-Marquardt" you mean the > stochastic-diagonal version referenced in the "Efficient Backprop" paper? > This is very different than regular Levenberg-Marquardt and you should > include this distinction. I have looked at the Levenber-Marquardt in "Efficient Backprop", the one in my course notes and the one used in matlab and it seems they are all the same (or very similar). So, I guess I really meant the stochastic-diagonal version. Haven't knew there is another one. I'll specify it. > Other comments: > - Ideally, some amount of testing should be done in parallel with > development. You will inevitably be ad-hoc testing your implementation > as you go, don't throw that code away but put it in a unit test. Right, but I thought there should be a period of time when I'll take all the tests I have created, polish them and put them together. I will make it shorter because I will write some tests and documentation during the implementation phase and I will need more time for that part because I will implement more algorithms. I will have to specify the time schedule more precisely if I'll implement more algorithms. > - I'd like to see more than simply "add tests". Specifically, you should give > some thought into how exactly you are going to go about unit-testing your > implementation. This will require some careful thought about the nature of > MLPs themselves, and how to go about verifying correctness. Regression > tests (e.g. making sure you get the same output as before given the same > random seed) are good for catching bugs introduced by refactoring, but they > are not the whole story. I will try to look at other neural networks libraries, how they are testing their code. So far, I can say FANN doesn't have many tests. > Otherwise, it seems like a good proposal. As I said, it seems like a rather > small amount of actual implementation, even if you are only budgeting the > first half of the work period. I would look for some additional features to > flesh out the implementation side of the proposal. Ok, how about implementing more learning algorithms: * efficient backprop -- almost done * Delta-bar-delta -- batch learning, shouldn't be that hard * resilient backprop -- to me it looks similar to DBD, but it should be one of the fastest algorithms so it would be nice to benchmark it. * stochastic diagonal Levenberq-Marquardt -- second order algorithm Thanks for your comments Dave (trying to distinguish our names :-) ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
