we seem to have a number of posts all talking about non-determinism in
Moses.  here is a full answer.

--in general, Machine Translation training is non-convex.  this means
that there are multiple solutions and each time you run a full
training job, you will get different results.  in particular, you will
see different results when running Giza++ (any flavour) and MERT.

--the best way to deal with this (and most expensive) would be to run
the full pipe-line, from scratch and multiple times.  this will give
you a feel for variance --differences in results.  in general,
variance arising from Giza++ is less damaging than variance from MERT.

--to reduce variance it is best to use as much data as possible at
each stage.  (100 sentences for tuning is far too low;  you should be
using at least 1000 sentences).  it is possible to reduce this
variability by using better machine learning, but in general it will
always be there.

--another strategy I know about is to fix everything once you have a
set of good weights and never rerun MERT.  should you need to change
say the language model, you will then manually alter the associated
weight.  this will mean stability, but at the obvious cost of
generality.  it is also ugly.

Miles

On 22 November 2011 09:36, Jelita Asian <jelitay...@gmail.com> wrote:
> I'm translating English to Indonesian and vice versa using Moses.
> I discover that when I run in different machines and even in the same
> machine, the result can be different especially with tuning.
>
> So far I've discovered 3 places which cause the result to be different.
> 1. mert-modified.pl, I just need to activate predictable-seed.
> 2. mkcls, just set the seed for each run
> 3. mgiza: I find that even for the first iteration, the result is already
> different:
>
> In one run:
>
> Model1: Iteration 1
> Model1: (1) TRAIN CROSS-ENTROPY 15.8786 PERPLEXITY 60246.2
> Model1: (1) VITERBI TRAIN CROSS-ENTROPY 20.5269 PERPLEXITY 1.51077e+06
> Model 1 Iteration: 1 took: 1 seconds
>
>  In second run:
>
> Model1: Iteration 1
> Model1: (1) TRAIN CROSS-ENTROPY 15.928 PERPLEXITY 62347.7
> Model1: (1) VITERBI TRAIN CROSS-ENTROPY 20.5727 PERPLEXITY 1.55952e+06
> Model 1 Iteration: 1 took: 1 seconds
>
> I have no idea where the randomization occurs for MGIZA even after looking
> at the codes which is hard to be understood.
>
> So my questions are:
> 1. How do I set it so the cross-entropy result in MGIZA to be the same? I
> think randomisation occurs somewhere but I can't find it.
>
> 2. I read in some threads that we need to run multiple time and average the
> result for the run to report. However, how I can find the best combination
> for training and  tuning parameters if the result for each run is different?
> For example if I want to find the best combination for which alignment and
> which reordering model.
>
> 3. Is that possible that tuning causes worst result? My corpus is around
> 500,000 words and I use 100 sentences for tuning. Can the sentences for
> tuning be used for training or are they supposed to be separate?  I used 100
> sentences which are different from the training set. My non-tuning NIST and
> BLEU results are around 6.5 and 0.21, while the non tuning results are
> around 6.1 and 0.19.
> Is not the result a bit too low? I'm not sure how to increase it.
>
> Sorry for the multiple questions in one post. I can separate them into
> different posts but I don't want to spam the mailing list. Thanks. Any help
> will be appreciated.
>
> Best regards,
>
> Jelita
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to