Hi Miles, Thanks for your reply.
--in general, Machine Translation training is non-convex. this means > that there are multiple solutions and each time you run a full > training job, you will get different results. in particular, you will > see different results when running Giza++ (any flavour) and MERT. > > Is there no way to stop the variant in Giza++? I look at the code but has no idea where it occurs. > --the best way to deal with this (and most expensive) would be to run > the full pipe-line, from scratch and multiple times. this will give > you a feel for variance --differences in results. in general, > variance arising from Giza++ is less damaging than variance from MERT. > > How many run is enough for this? As you say, it would be very expensive to do so. > --to reduce variance it is best to use as much data as possible at > each stage. (100 sentences for tuning is far too low; you should be > using at least 1000 sentences). it is possible to reduce this > variability by using better machine learning, but in general it will > always be there. > > What do you mean by better machine learning? Isn't the 500,000 words corpus enough? For the 1,000 sentences for tuning, can I use the same sentences as used in the training or they shall be separate sets of sentences? > --another strategy I know about is to fix everything once you have a > set of good weights and never rerun MERT. should you need to change > say the language model, you will then manually alter the associated > weight. this will mean stability, but at the obvious cost of > generality. it is also ugly. > > Could you elaborate a bit about the fixing everything and never rerun MERT part? Do you mean after running n times, we find the best variation of variables (there are so many of them) and don't run MERT which I understand is for tuning? Thanks and sorry to answer it with more questions. Cheers, Jelita
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support