>
>> --in general, Machine Translation training is non-convex.  this means
>> that there are multiple solutions and each time you run a full
>> training job, you will get different results.  in particular, you will
>> see different results when running Giza++ (any flavour) and MERT.
>>
>
> Is there no way to stop the variant in Giza++? I look at the code but has no
> idea where it occurs.

no, this is a property of the task, not the method.  put it another
way, there is nothing which tells the model how words are translated.
Giza++ makes a guess based upon how well it `explains's the training
data (log-likelihood / cross entropy).  there are many ways to achieve
the same log-likelihood and each guess amounts to a different
translation model.  on average these alternative models will all be
similar to each other (words are translated in similar ways), but in
general you will find differences.


>>
>> --the best way to deal with this (and most expensive) would be to run
>> the full pipe-line, from scratch and multiple times.  this will give
>> you a feel for variance --differences in results.  in general,
>> variance arising from Giza++ is less damaging than variance from MERT.
>>
> How many run is enough for this? As you say, it would be very expensive to
> do so.

how long is a piece of string?

>
>>
>> --to reduce variance it is best to use as much data as possible at
>> each stage.  (100 sentences for tuning is far too low;  you should be
>> using at least 1000 sentences).  it is possible to reduce this
>> variability by using better machine learning, but in general it will
>> always be there.
>>
> What do you mean by better machine learning? Isn't the 500,000 words corpus
> enough? For the 1,000 sentences for tuning, can I use the same sentences as
> used in the training or they shall be separate sets of sentences?

lattice MERT is an example, or the Berkeley Aligner.

you cannot use the same sentences for training and tuning, as has been
explained earlier on the list


>
>>
>> --another strategy I know about is to fix everything once you have a
>> set of good weights and never rerun MERT.  should you need to change
>> say the language model, you will then manually alter the associated
>> weight.  this will mean stability, but at the obvious cost of
>> generality.  it is also ugly.
>>
> Could you elaborate a bit about the fixing everything and never rerun MERT
> part? Do you mean after running n times, we find the best variation of
> variables (there are so many of them) and don't run MERT which I understand
> is for tuning?

if you have some problem that is fairly stable (uses the same training
set, language models etc) then after running MERT many times and
evaluating it on a disjoint test set, you pick the weights that
produce good results.  afterwards you do not re-run MERT even if you
have changed the model.

as i mentioned, this is ugly and something you do not want to do
unless you are forced to do it

Miles
>
> Thanks and sorry to answer it with more questions.
>
> Cheers,
>
> Jelita
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to