I had read this section, which deals with translation model combination. not much on language model or tuning.
For instance : if I want to make sure that a specific expression "titres" is translated in "equities" from French to English. These 2 words have specifically to be in the Monolingual corpus of the language model, or in the parallel corpus ? the fact that 2 "parallel expressions" are in the tuning set but not present in the parallel corpora nor the monolingual LM, can it trigger a good translation ? I am not sure to be clear .... thanks again for your help. Le 14/08/2015 20:52, Rico Sennrich a écrit : > Hi Vincent, > > this section describes some domain adaptation methods that are > implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain > > It is incomplete (focusing on parallel data and the translation model), > and does not recommend best practices. > > In general, my recommendation is to use in-domain data whenever possible > (for the language model, translation model, and held-out in-domain data > for tuning/testing). Out-of-domain data can help, but also hurt your > system: the effect depends on your domains and the amount of data you > have for each. Data selection, instance weighting, model interpolation > and domain features are different methods that give you the benefits of > out-of-domain data, but reduce its harmful effects, and are often better > than just concatenating all the data you have. > > best wishes, > Rico > > > On 14/08/15 16:22, Vincent Nguyen wrote: >> Hi, >> >> I can't find a sort of "tutorial " on domain adaptation path to follow. >> I read this in the doc : >> The language model should be trained on a corpus that is suitable to the >> domain. If the translation model is trained on a parallel corpus, then >> the language model should be trained on the output side of that corpus, >> although using additional training data is often beneficial. >> >> And in the training section of the EMS, there is a sub section with >> domain-features=.... >> >> What is the best practice ? >> >> Let's say for instance that I would like to specialize my modem in >> finance translation, with specific corpus. >> >> Should I train the Language model with finance stuff ? >> Should I include parallel corpus in the translation model training ? >> Should I tune with financial data sets ? >> >> Please help me to understand. >> Vincent >> >> _______________________________________________ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support