Hi, I managed to implement my custom metric with kbmira, but keep running into weird behavior. If I do not set --model-bg the tuning results are actually constantly decreasing between iterations. With --model-bg it seems to work reasonably well (kbmira recovers from occasional decreases).
The metric is plain F-score computed from three statistics: correct edits, proposed edits, gold standard edits. For PRO and kbmira, sentence level F-score is being computed from those. I do not think this is an issue with the code of my metric itself, since it works very well with MERT and reasonably well with PRO. I am using essentially the same code for kbmira, just adding the background corpus statistics to the sufficient statistics of my metric. So, my question would be, is the significantly different behavior for the two types of background corpora in kbmira justified or should I assume that I messed something up somewhere? Also, do the initial values for the background corpus actually matter? Currently I just set them all to 1. Thanks, Best, Marcin _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
