Hello Nat,
for NMT ensembles, you just average the probability distribution of
different models at each time step before selecting the next hypothesis
(or hypotheses in beam search). If you're familiar with Moses, this is
similar to what happens when we combine different feature functions in
the log-linear, global model. That's also why I don't think the
comparison of a neural network ensemble to Moses is unfair in principle
- both combine various models to obtain the final translation
probablities - the Moses phrase table alone has (at least) four.
Our official submissions to WMT16 are ensembles, but even our single
systems outperform non-neural submissions for EN->DE, EN->RO, EN->CS and
DE->EN (in terms of BLEU).
best wishes,
Rico
On 03/11/16 02:05, Nat Gillin wrote:
Dear Moses Community,
On recent papers, there has been much BLEU scores reported on ensemble
of neural machine translation systems. I would like to ask whether any
one know how are these ensembles created?
Is it some sort of averaged pooling layer at the end? Is it some sort
of voting of multiple system when the system is decoding at every time
step?
Any pointers to papers describing this magical ensemble would be great =)
Most papers just say that, we ensemble, we beat Moses. Are there cases
where a single model beat Moses in a normal translation task without
ensembling?
Regards,
Nat
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support