On Thu, Jun 4, 2015 at 1:26 PM, Till Rohrmann <[email protected]> wrote:

> Maybe also the default learning rate of 0.1 is set too high.
>

Could be.

But grid search on learning rate is pretty standard practice. Running
multiple learning engines at the same time with different learning rates is
pretty plausible.

Also, using something like adagrad will knock down high learning rates very
quickly if you get a nearly divergent step. This can make initially high
learning rates quite plausible.

Reply via email to