Hello, I'm joining this discussion because I've been having similar problems during the last few days, translating from Swedish to Danish. I trained a number of one-factor models, for which MERT tuning worked nicely and yielded very noticeable performance improvements. When I started using factored models, things suddenly got worse.
At first, my MERT results were really bad because I was using the wrong reference corpus. The input corpus needs to contain as many factors as the translation model requires, but there must only be one factor in the reference corpus (the word form), so you can't use the same variant of the two corpora for input and reference with a factored model. Maybe you find this obvious (and in a way it is), but it took me some time to find out. After fixing this error, the dramatic performance drop from the first experiments went away, but still it's not as good as it could be. MERT optimisation now sometimes improves the scores, but if it does, the improvement is only around half a percent BLEU, whereas it used to gain several percent points in the earlier experiments. Sometimes, it slightly degrades performance instead. There shouldn't be a corpus problem, as I've been using the same training, devtest and test corpora for both the more and the less successful experiments. The devtest corpus contains 1000 sentences. Is there any particular reason why MERT should perform worse with factored models? I thought about whether the number of parameters to be optimised might have an effect, but in fact in one of the models I'm having problems with, there is only one additional weight (a POS language model), so we're not talking about parameter explosion. Best, Christian _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
