Hello Nat,

for NMT ensembles, you just average the probability distribution of different models at each time step before selecting the next hypothesis (or hypotheses in beam search). If you're familiar with Moses, this is similar to what happens when we combine different feature functions in the log-linear, global model. That's also why I don't think the comparison of a neural network ensemble to Moses is unfair in principle - both combine various models to obtain the final translation probablities - the Moses phrase table alone has (at least) four.

Our official submissions to WMT16 are ensembles, but even our single systems outperform non-neural submissions for EN->DE, EN->RO, EN->CS and DE->EN (in terms of BLEU).

best wishes,
Rico

On 03/11/16 02:05, Nat Gillin wrote:
Dear Moses Community,

On recent papers, there has been much BLEU scores reported on ensemble of neural machine translation systems. I would like to ask whether any one know how are these ensembles created?

Is it some sort of averaged pooling layer at the end? Is it some sort of voting of multiple system when the system is decoding at every time step?

Any pointers to papers describing this magical ensemble would be great =)

Most papers just say that, we ensemble, we beat Moses. Are there cases where a single model beat Moses in a normal translation task without ensembling?

Regards,
Nat


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to