While re-running an experiment (using the exact same configuration, same
models, translating the same data) I noticed that occasionally, I get a
slightly different 1-best list.

Upon further examination, running Moses with the same config multiple times
often (but not always) produces slightly different n-best lists. This is a
bit worrisome from the perspective of being able to re-run an experiment to
reproduce results.

Is this a known issue?

I've examined the n-best lists, and it seems there are at least a couple of
interesting cases. In the simplest case, several translations of a given
sentence produce the exact same score, and these tied translations appear in
different order during different runs. This is a bit odd, but terribly
worrisome. The stranger case is when there are two different decoding runs,
and for a given sentence, there are translations that appear only in run A,
and different translations that only appear in run B.

I have these n-best lists available for review - I'll post them in a
separate email in this thread.

Cheers,
Lane
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to