Re: [Moses-support] Domain adaptation
I had read this section, which deals with translation model combination. not much on language model or tuning. For instance : if I want to make sure that a specific expression titres is translated in equities from French to English. These 2 words have specifically to be in the Monolingual corpus of the language model, or in the parallel corpus ? the fact that 2 parallel expressions are in the tuning set but not present in the parallel corpora nor the monolingual LM, can it trigger a good translation ? I am not sure to be clear thanks again for your help. Le 14/08/2015 20:52, Rico Sennrich a écrit : Hi Vincent, this section describes some domain adaptation methods that are implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain It is incomplete (focusing on parallel data and the translation model), and does not recommend best practices. In general, my recommendation is to use in-domain data whenever possible (for the language model, translation model, and held-out in-domain data for tuning/testing). Out-of-domain data can help, but also hurt your system: the effect depends on your domains and the amount of data you have for each. Data selection, instance weighting, model interpolation and domain features are different methods that give you the benefits of out-of-domain data, but reduce its harmful effects, and are often better than just concatenating all the data you have. best wishes, Rico On 14/08/15 16:22, Vincent Nguyen wrote: Hi, I can't find a sort of tutorial on domain adaptation path to follow. I read this in the doc : The language model should be trained on a corpus that is suitable to the domain. If the translation model is trained on a parallel corpus, then the language model should be trained on the output side of that corpus, although using additional training data is often beneficial. And in the training section of the EMS, there is a sub section with domain-features= What is the best practice ? Let's say for instance that I would like to specialize my modem in finance translation, with specific corpus. Should I train the Language model with finance stuff ? Should I include parallel corpus in the translation model training ? Should I tune with financial data sets ? Please help me to understand. Vincent ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Domain adaptation
Hi Vincent, this section describes some domain adaptation methods that are implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain It is incomplete (focusing on parallel data and the translation model), and does not recommend best practices. In general, my recommendation is to use in-domain data whenever possible (for the language model, translation model, and held-out in-domain data for tuning/testing). Out-of-domain data can help, but also hurt your system: the effect depends on your domains and the amount of data you have for each. Data selection, instance weighting, model interpolation and domain features are different methods that give you the benefits of out-of-domain data, but reduce its harmful effects, and are often better than just concatenating all the data you have. best wishes, Rico On 14/08/15 16:22, Vincent Nguyen wrote: Hi, I can't find a sort of tutorial on domain adaptation path to follow. I read this in the doc : The language model should be trained on a corpus that is suitable to the domain. If the translation model is trained on a parallel corpus, then the language model should be trained on the output side of that corpus, although using additional training data is often beneficial. And in the training section of the EMS, there is a sub section with domain-features= What is the best practice ? Let's say for instance that I would like to specialize my modem in finance translation, with specific corpus. Should I train the Language model with finance stuff ? Should I include parallel corpus in the translation model training ? Should I tune with financial data sets ? Please help me to understand. Vincent ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Domain adaptation
Hi, I found this older tutorial to be very useful as well: Practical Domain Adaptation by Marcello Federico and Nicola Bertoldi http://www.mt-archive.info/10/AMTA-2012-Bertoldi-ppt.pdf (The document formatting is unfortunately slightly messed up.) SMT research survey wiki: http://www.statmt.org/survey/Topic/DomainAdaptation Cheers, Matthias On Fri, 2015-08-14 at 20:37 +0100, Barry Haddow wrote: You could try this tutorial http://www.statmt.org/mtma15/uploads/mtma15-domain-adaptation.pdf On 14/08/15 20:20, Vincent Nguyen wrote: I had read this section, which deals with translation model combination. not much on language model or tuning. For instance : if I want to make sure that a specific expression titres is translated in equities from French to English. These 2 words have specifically to be in the Monolingual corpus of the language model, or in the parallel corpus ? the fact that 2 parallel expressions are in the tuning set but not present in the parallel corpora nor the monolingual LM, can it trigger a good translation ? I am not sure to be clear thanks again for your help. Le 14/08/2015 20:52, Rico Sennrich a écrit : Hi Vincent, this section describes some domain adaptation methods that are implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain It is incomplete (focusing on parallel data and the translation model), and does not recommend best practices. In general, my recommendation is to use in-domain data whenever possible (for the language model, translation model, and held-out in-domain data for tuning/testing). Out-of-domain data can help, but also hurt your system: the effect depends on your domains and the amount of data you have for each. Data selection, instance weighting, model interpolation and domain features are different methods that give you the benefits of out-of-domain data, but reduce its harmful effects, and are often better than just concatenating all the data you have. best wishes, Rico On 14/08/15 16:22, Vincent Nguyen wrote: Hi, I can't find a sort of tutorial on domain adaptation path to follow. I read this in the doc : The language model should be trained on a corpus that is suitable to the domain. If the translation model is trained on a parallel corpus, then the language model should be trained on the output side of that corpus, although using additional training data is often beneficial. And in the training section of the EMS, there is a sub section with domain-features= What is the best practice ? Let's say for instance that I would like to specialize my modem in finance translation, with specific corpus. Should I train the Language model with finance stuff ? Should I include parallel corpus in the translation model training ? Should I tune with financial data sets ? Please help me to understand. Vincent ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Domain adaptation
You could try this tutorial http://www.statmt.org/mtma15/uploads/mtma15-domain-adaptation.pdf On 14/08/15 20:20, Vincent Nguyen wrote: I had read this section, which deals with translation model combination. not much on language model or tuning. For instance : if I want to make sure that a specific expression titres is translated in equities from French to English. These 2 words have specifically to be in the Monolingual corpus of the language model, or in the parallel corpus ? the fact that 2 parallel expressions are in the tuning set but not present in the parallel corpora nor the monolingual LM, can it trigger a good translation ? I am not sure to be clear thanks again for your help. Le 14/08/2015 20:52, Rico Sennrich a écrit : Hi Vincent, this section describes some domain adaptation methods that are implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain It is incomplete (focusing on parallel data and the translation model), and does not recommend best practices. In general, my recommendation is to use in-domain data whenever possible (for the language model, translation model, and held-out in-domain data for tuning/testing). Out-of-domain data can help, but also hurt your system: the effect depends on your domains and the amount of data you have for each. Data selection, instance weighting, model interpolation and domain features are different methods that give you the benefits of out-of-domain data, but reduce its harmful effects, and are often better than just concatenating all the data you have. best wishes, Rico On 14/08/15 16:22, Vincent Nguyen wrote: Hi, I can't find a sort of tutorial on domain adaptation path to follow. I read this in the doc : The language model should be trained on a corpus that is suitable to the domain. If the translation model is trained on a parallel corpus, then the language model should be trained on the output side of that corpus, although using additional training data is often beneficial. And in the training section of the EMS, there is a sub section with domain-features= What is the best practice ? Let's say for instance that I would like to specialize my modem in finance translation, with specific corpus. Should I train the Language model with finance stuff ? Should I include parallel corpus in the translation model training ? Should I tune with financial data sets ? Please help me to understand. Vincent ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Thread-safe Lattice Decoding
Hi: The website documentation notes that lattice input may not work with multi-threaded decoding. Is there a reason to believe this is not likely to work? To the extent that each thread processes a single input example (lattice instead of sentence), it seems like the shared-resource issues would be no different than with sentence input. If it is indeed not supported, can you give me some idea of what might be necessary to extend it to this use case? Thanks! James ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support