Hi Vincent, this section describes some domain adaptation methods that are implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain
It is incomplete (focusing on parallel data and the translation model), and does not recommend best practices. In general, my recommendation is to use in-domain data whenever possible (for the language model, translation model, and held-out in-domain data for tuning/testing). Out-of-domain data can help, but also hurt your system: the effect depends on your domains and the amount of data you have for each. Data selection, instance weighting, model interpolation and domain features are different methods that give you the benefits of out-of-domain data, but reduce its harmful effects, and are often better than just concatenating all the data you have. best wishes, Rico On 14/08/15 16:22, Vincent Nguyen wrote: > Hi, > > I can't find a sort of "tutorial " on domain adaptation path to follow. > I read this in the doc : > The language model should be trained on a corpus that is suitable to the > domain. If the translation model is trained on a parallel corpus, then > the language model should be trained on the output side of that corpus, > although using additional training data is often beneficial. > > And in the training section of the EMS, there is a sub section with > domain-features=.... > > What is the best practice ? > > Let's say for instance that I would like to specialize my modem in > finance translation, with specific corpus. > > Should I train the Language model with finance stuff ? > Should I include parallel corpus in the translation model training ? > Should I tune with financial data sets ? > > Please help me to understand. > Vincent > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support