Re: [Moses-support] Domain adaptation

2015-08-14 Thread Vincent Nguyen

I had read this section, which deals with translation model combination. 
not much on language model or tuning.

For instance : if I want to make sure that a specific expression 
titres is translated in equities from French to English.

These 2 words have specifically to be in the Monolingual corpus of the 
language model, or in the parallel corpus ?

the fact that 2 parallel expressions are in the tuning set but not 
present in the parallel corpora nor the monolingual LM, can it trigger a 
good translation ?

I am not sure to be clear 

thanks again for your help.


Le 14/08/2015 20:52, Rico Sennrich a écrit :
 Hi Vincent,

 this section describes some domain adaptation methods that are
 implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain

 It is incomplete (focusing on parallel data and the translation model),
 and does not recommend best practices.

 In general, my recommendation is to use in-domain data whenever possible
 (for the language model, translation model, and held-out in-domain data
 for tuning/testing). Out-of-domain data can help, but also hurt your
 system: the effect depends on your domains and the amount of data you
 have for each. Data selection, instance weighting, model interpolation
 and domain features are different methods that give you the benefits of
 out-of-domain data, but reduce its harmful effects, and are often better
 than just concatenating all the data you have.

 best wishes,
 Rico


 On 14/08/15 16:22, Vincent Nguyen wrote:
 Hi,

 I can't find a sort of tutorial  on domain adaptation path to follow.
 I read this in the doc :
 The language model should be trained on a corpus that is suitable to the
 domain. If the translation model is trained on a parallel corpus, then
 the language model should be trained on the output side of that corpus,
 although using additional training data is often beneficial.

 And in the training section of the EMS, there is a sub section with
 domain-features=

 What is the best practice ?

 Let's say for instance that I would like to specialize my modem in
 finance translation, with specific corpus.

 Should I train the Language model with finance stuff ?
 Should I include parallel corpus in the translation model training ?
 Should I tune with financial data sets ?

 Please help me to understand.
 Vincent

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Domain adaptation

2015-08-14 Thread Rico Sennrich
Hi Vincent,

this section describes some domain adaptation methods that are 
implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain

It is incomplete (focusing on parallel data and the translation model), 
and does not recommend best practices.

In general, my recommendation is to use in-domain data whenever possible 
(for the language model, translation model, and held-out in-domain data 
for tuning/testing). Out-of-domain data can help, but also hurt your 
system: the effect depends on your domains and the amount of data you 
have for each. Data selection, instance weighting, model interpolation 
and domain features are different methods that give you the benefits of 
out-of-domain data, but reduce its harmful effects, and are often better 
than just concatenating all the data you have.

best wishes,
Rico


On 14/08/15 16:22, Vincent Nguyen wrote:
 Hi,

 I can't find a sort of tutorial  on domain adaptation path to follow.
 I read this in the doc :
 The language model should be trained on a corpus that is suitable to the
 domain. If the translation model is trained on a parallel corpus, then
 the language model should be trained on the output side of that corpus,
 although using additional training data is often beneficial.

 And in the training section of the EMS, there is a sub section with
 domain-features=

 What is the best practice ?

 Let's say for instance that I would like to specialize my modem in
 finance translation, with specific corpus.

 Should I train the Language model with finance stuff ?
 Should I include parallel corpus in the translation model training ?
 Should I tune with financial data sets ?

 Please help me to understand.
 Vincent

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Domain adaptation

2015-08-14 Thread Matthias Huck
Hi,

I found this older tutorial to be very useful as well:

Practical Domain Adaptation by Marcello Federico and Nicola Bertoldi
http://www.mt-archive.info/10/AMTA-2012-Bertoldi-ppt.pdf
(The document formatting is unfortunately slightly messed up.)

SMT research survey wiki:
http://www.statmt.org/survey/Topic/DomainAdaptation

Cheers,
Matthias


On Fri, 2015-08-14 at 20:37 +0100, Barry Haddow wrote:
 You could try this tutorial
 
 http://www.statmt.org/mtma15/uploads/mtma15-domain-adaptation.pdf
 
 On 14/08/15 20:20, Vincent Nguyen wrote:
  I had read this section, which deals with translation model combination.
  not much on language model or tuning.
 
  For instance : if I want to make sure that a specific expression
  titres is translated in equities from French to English.
 
  These 2 words have specifically to be in the Monolingual corpus of the
  language model, or in the parallel corpus ?
 
  the fact that 2 parallel expressions are in the tuning set but not
  present in the parallel corpora nor the monolingual LM, can it trigger a
  good translation ?
 
  I am not sure to be clear 
 
  thanks again for your help.
 
 
  Le 14/08/2015 20:52, Rico Sennrich a écrit :
  Hi Vincent,
 
  this section describes some domain adaptation methods that are
  implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain
 
  It is incomplete (focusing on parallel data and the translation model),
  and does not recommend best practices.
 
  In general, my recommendation is to use in-domain data whenever possible
  (for the language model, translation model, and held-out in-domain data
  for tuning/testing). Out-of-domain data can help, but also hurt your
  system: the effect depends on your domains and the amount of data you
  have for each. Data selection, instance weighting, model interpolation
  and domain features are different methods that give you the benefits of
  out-of-domain data, but reduce its harmful effects, and are often better
  than just concatenating all the data you have.
 
  best wishes,
  Rico
 
 
  On 14/08/15 16:22, Vincent Nguyen wrote:
  Hi,
 
  I can't find a sort of tutorial  on domain adaptation path to follow.
  I read this in the doc :
  The language model should be trained on a corpus that is suitable to the
  domain. If the translation model is trained on a parallel corpus, then
  the language model should be trained on the output side of that corpus,
  although using additional training data is often beneficial.
 
  And in the training section of the EMS, there is a sub section with
  domain-features=
 
  What is the best practice ?
 
  Let's say for instance that I would like to specialize my modem in
  finance translation, with specific corpus.
 
  Should I train the Language model with finance stuff ?
  Should I include parallel corpus in the translation model training ?
  Should I tune with financial data sets ?
 
  Please help me to understand.
  Vincent
 
  ___
  Moses-support mailing list
  Moses-support@mit.edu
  http://mailman.mit.edu/mailman/listinfo/moses-support
 
  ___
  Moses-support mailing list
  Moses-support@mit.edu
  http://mailman.mit.edu/mailman/listinfo/moses-support
  ___
  Moses-support mailing list
  Moses-support@mit.edu
  http://mailman.mit.edu/mailman/listinfo/moses-support
 
 
 



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Domain adaptation

2015-08-14 Thread Barry Haddow
You could try this tutorial

http://www.statmt.org/mtma15/uploads/mtma15-domain-adaptation.pdf

On 14/08/15 20:20, Vincent Nguyen wrote:
 I had read this section, which deals with translation model combination.
 not much on language model or tuning.

 For instance : if I want to make sure that a specific expression
 titres is translated in equities from French to English.

 These 2 words have specifically to be in the Monolingual corpus of the
 language model, or in the parallel corpus ?

 the fact that 2 parallel expressions are in the tuning set but not
 present in the parallel corpora nor the monolingual LM, can it trigger a
 good translation ?

 I am not sure to be clear 

 thanks again for your help.


 Le 14/08/2015 20:52, Rico Sennrich a écrit :
 Hi Vincent,

 this section describes some domain adaptation methods that are
 implemented in Moses: http://www.statmt.org/moses/?n=Advanced.Domain

 It is incomplete (focusing on parallel data and the translation model),
 and does not recommend best practices.

 In general, my recommendation is to use in-domain data whenever possible
 (for the language model, translation model, and held-out in-domain data
 for tuning/testing). Out-of-domain data can help, but also hurt your
 system: the effect depends on your domains and the amount of data you
 have for each. Data selection, instance weighting, model interpolation
 and domain features are different methods that give you the benefits of
 out-of-domain data, but reduce its harmful effects, and are often better
 than just concatenating all the data you have.

 best wishes,
 Rico


 On 14/08/15 16:22, Vincent Nguyen wrote:
 Hi,

 I can't find a sort of tutorial  on domain adaptation path to follow.
 I read this in the doc :
 The language model should be trained on a corpus that is suitable to the
 domain. If the translation model is trained on a parallel corpus, then
 the language model should be trained on the output side of that corpus,
 although using additional training data is often beneficial.

 And in the training section of the EMS, there is a sub section with
 domain-features=

 What is the best practice ?

 Let's say for instance that I would like to specialize my modem in
 finance translation, with specific corpus.

 Should I train the Language model with finance stuff ?
 Should I include parallel corpus in the translation model training ?
 Should I tune with financial data sets ?

 Please help me to understand.
 Vincent

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Thread-safe Lattice Decoding

2015-08-14 Thread James H. Cross III
Hi:

The website documentation notes that lattice input may not work with
multi-threaded decoding. Is there a reason to believe this is not
likely to work? To the extent that each thread processes a single
input example (lattice instead of sentence), it seems like the
shared-resource issues would be no different than with sentence input.

If it is indeed not supported, can you give me some idea of what might
be necessary to extend it to this use case?

Thanks!
James
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support