Hey Marwa, We have been having this problem with NPLM and we have found no "real solution". There were couple of threads on the mailing list with this problem so far. Basically the solution that we use is to lower the learning rate (from 1 to .5. If .5 doesn't work to .25 and so on) and increase the number of generations that you produce because of it. Alternatively you may try to use the experimental gradient clipping code that Ashish implemented. Here's a quote of his email: > > You should be able to download the version of the nplm where the updates > (gradient*learning_rate) are clipped between +5 and -5 > http://www.isi.edu/~avaswani/nplm_clipped.tar.gz > If you want to change the magnitude of the update, please change it inside > struct Clipper{ > double operator() (double x) const { > return std::min(5., std::max(x,-5.)); > //return(x); > } > }; > > in neuralClasses.h > Right now, the clipping has been implemented only for standard SGD > training, and not for adagrad or adadelta.
Cheers, Nick On Tue, Apr 21, 2015 at 6:17 AM, Marwa Refaie <basmal...@hotmail.com> wrote: > Hi all > > When I train BilngualLM with large corpus it give 10 models.nplm filez > with small numbers then alot if lines nan nan nan nan nan nan nan nan nan > nan nan nan nan > It works perfect with smaller corpus. Any suggestions plzzz > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support