I had read this section, which deals with translation model combination. 
not much on language model or tuning.

For instance : if I want to make sure that a specific expression 
"titres" is translated in "equities" from French to English.

These 2 words have specifically to be in the Monolingual corpus of the 
language model, or in the parallel corpus ?

the fact that 2 "parallel expressions" are in the tuning set but not 
present in the parallel corpora nor the monolingual LM, can it trigger a 
good translation ?

I am not sure to be clear ....

thanks again for your help.


Le 14/08/2015 20:52, Rico Sennrich a écrit :
> Hi Vincent,
>
> this section describes some domain adaptation methods that are
> implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain
>
> It is incomplete (focusing on parallel data and the translation model),
> and does not recommend best practices.
>
> In general, my recommendation is to use in-domain data whenever possible
> (for the language model, translation model, and held-out in-domain data
> for tuning/testing). Out-of-domain data can help, but also hurt your
> system: the effect depends on your domains and the amount of data you
> have for each. Data selection, instance weighting, model interpolation
> and domain features are different methods that give you the benefits of
> out-of-domain data, but reduce its harmful effects, and are often better
> than just concatenating all the data you have.
>
> best wishes,
> Rico
>
>
> On 14/08/15 16:22, Vincent Nguyen wrote:
>> Hi,
>>
>> I can't find a sort of "tutorial " on domain adaptation path to follow.
>> I read this in the doc :
>> The language model should be trained on a corpus that is suitable to the
>> domain. If the translation model is trained on a parallel corpus, then
>> the language model should be trained on the output side of that corpus,
>> although using additional training data is often beneficial.
>>
>> And in the training section of the EMS, there is a sub section with
>> domain-features=....
>>
>> What is the best practice ?
>>
>> Let's say for instance that I would like to specialize my modem in
>> finance translation, with specific corpus.
>>
>> Should I train the Language model with finance stuff ?
>> Should I include parallel corpus in the translation model training ?
>> Should I tune with financial data sets ?
>>
>> Please help me to understand.
>> Vincent
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to