Hi, I am trying to build a factored translation model using stems and part-of-speech for a week now and I cannot have satisfying results. This probably comes from my factor configuration as I probably do not fully understand how it work (I am following the paper Factored Translation Model from Koehn and Hoang).
I previously built a standard phrase based model (with the same corpus) which gave me around 24-25 BLEU score (DE-EN). For my actual factored model, BLEU score is around 1 (?). I tried opening the moses.ini's, (tuned or not) to see if I could have a something translated by copy/pasting some lines from the original corpus, but it only translates from german to german and does not recognize most of the words if not all. The motivation behind the factored model is that there are too many OOVs with the standard phrase-base, so I wanted to try using stems to reduce them. I am annotating the corpus with TreeTagger and the factor configuration is as following : input-factors = word stem pos output-factors = word stem pos alignment-factors = "word+stem -> word+stem" translation-factors = "stem -> stem,pos -> pos" reordering-factors = "word -> word" generation-factors = "stem -> pos,stem+pos -> word" decoding-steps = "t0,g0,t1,g1" Is there something wrong with that ? I only use a single language model over surface forms as the LM over POS yields a segmentation fault in the tuning phase. Does anyone have an idea how I should configure my model to exploit stems in the source language ? Thanks a lot, Floran
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support