Re: [Scikit-learn-general] gsoc application MLP

David Warde-Farley Thu, 29 Mar 2012 12:36:32 -0700

On Thu, Mar 29, 2012 at 06:19:48PM +0200, David Marek wrote:
> Hi,
> 
> I have created a first draft of my application for gsoc. I summarized
> all ideas from last thread so I hope it makes sense. You can read it
> at 
> https://docs.google.com/document/d/11zxSbsGwevd49JIqAiNz4Qb6cFYzHdJgH9ROegY9-qo/edit
> I would like to ask Andreas and David to have a look. Every feedback is 
> welcome.


I'd emphasize that "SGD" is a class of algorithms, and the implementations
that exist are purely for the linear classifier setting. I'm not sure how
much use they will be in an SGD-for-MLP (they can maybe be reused for certain
kinds of output layers), but there is definitely more work in efficiently
computing the gradient.

I'm unsure, but if you're as familiar as you say with backpropagation, this
doesn't seem like that much actual code for a 2.5 month stretch you've
projected.

If possible, I wouldn't limit yourself to vanilla SGD as the only avenue for
optimization.  For small problems/model sizes, other avenues are worth
exploring, e.g.

- batch gradient descent with delta-bar-delta adaptation
  ( http://www.bcs.rochester.edu/people/robbie/jacobs.nn88.pdf )
  once you have the gradient formula taken care of, this is a few
  relatively simple lines of NumPy.
- miscellaneous numerical optimizers in scipy.optimize, in particular
  the "minimize" function which provides a unified interface to all the
  different optimization strategies.

In addition, beyond basic SGD, make sure you *at least* implement support for
a momentum term; this can help enormously with rapidly traversing and
escaping plateaus in the error surface, and is trivial to implement once
you are already computing the gradient. Polyak averaging may be another
useful avenue given any spare time.

I presume when you mention "Levenberg-Marquardt" you mean the
stochastic-diagonal version referenced in the "Efficient Backprop" paper?
This is very different than regular Levenberg-Marquardt and you should
include this distinction.

Other comments:
- Ideally, some amount of testing should be done in parallel with
  development. You will inevitably be ad-hoc testing your implementation
  as you go, don't throw that code away but put it in a unit test.

- I'd like to see more than simply "add tests". Specifically, you should give
  some thought into how exactly you are going to go about unit-testing your
  implementation. This will require some careful thought about the nature of
  MLPs themselves, and how to go about verifying correctness. Regression
  tests (e.g. making sure you get the same output as before given the same
  random seed) are good for catching bugs introduced by refactoring, but they
  are not the whole story.

Otherwise, it seems like a good proposal. As I said, it seems like a rather
small amount of actual implementation, even if you are only budgeting the
first half of the work period. I would look for some additional features to
flesh out the implementation side of the proposal.

David

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] gsoc application MLP

Reply via email to