Hi,

I found this older tutorial to be very useful as well:

"Practical Domain Adaptation" by Marcello Federico and Nicola Bertoldi
http://www.mt-archive.info/10/AMTA-2012-Bertoldi-ppt.pdf
(The document formatting is unfortunately slightly messed up.)

SMT research survey wiki:
http://www.statmt.org/survey/Topic/DomainAdaptation

Cheers,
Matthias


On Fri, 2015-08-14 at 20:37 +0100, Barry Haddow wrote:
> You could try this tutorial
> 
> http://www.statmt.org/mtma15/uploads/mtma15-domain-adaptation.pdf
> 
> On 14/08/15 20:20, Vincent Nguyen wrote:
> > I had read this section, which deals with translation model combination.
> > not much on language model or tuning.
> >
> > For instance : if I want to make sure that a specific expression
> > "titres" is translated in "equities" from French to English.
> >
> > These 2 words have specifically to be in the Monolingual corpus of the
> > language model, or in the parallel corpus ?
> >
> > the fact that 2 "parallel expressions" are in the tuning set but not
> > present in the parallel corpora nor the monolingual LM, can it trigger a
> > good translation ?
> >
> > I am not sure to be clear ....
> >
> > thanks again for your help.
> >
> >
> > Le 14/08/2015 20:52, Rico Sennrich a écrit :
> >> Hi Vincent,
> >>
> >> this section describes some domain adaptation methods that are
> >> implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain
> >>
> >> It is incomplete (focusing on parallel data and the translation model),
> >> and does not recommend best practices.
> >>
> >> In general, my recommendation is to use in-domain data whenever possible
> >> (for the language model, translation model, and held-out in-domain data
> >> for tuning/testing). Out-of-domain data can help, but also hurt your
> >> system: the effect depends on your domains and the amount of data you
> >> have for each. Data selection, instance weighting, model interpolation
> >> and domain features are different methods that give you the benefits of
> >> out-of-domain data, but reduce its harmful effects, and are often better
> >> than just concatenating all the data you have.
> >>
> >> best wishes,
> >> Rico
> >>
> >>
> >> On 14/08/15 16:22, Vincent Nguyen wrote:
> >>> Hi,
> >>>
> >>> I can't find a sort of "tutorial " on domain adaptation path to follow.
> >>> I read this in the doc :
> >>> The language model should be trained on a corpus that is suitable to the
> >>> domain. If the translation model is trained on a parallel corpus, then
> >>> the language model should be trained on the output side of that corpus,
> >>> although using additional training data is often beneficial.
> >>>
> >>> And in the training section of the EMS, there is a sub section with
> >>> domain-features=....
> >>>
> >>> What is the best practice ?
> >>>
> >>> Let's say for instance that I would like to specialize my modem in
> >>> finance translation, with specific corpus.
> >>>
> >>> Should I train the Language model with finance stuff ?
> >>> Should I include parallel corpus in the translation model training ?
> >>> Should I tune with financial data sets ?
> >>>
> >>> Please help me to understand.
> >>> Vincent
> >>>
> >>> _______________________________________________
> >>> Moses-support mailing list
> >>> Moses-support@mit.edu
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>
> >> _______________________________________________
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> 
> 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to