Hello,

following the manual for baseline creaition, I have trained the model using
Europarl v9 de-en pair.
Now I observe that obtained phrase table contains a lot of noise.

E.g. a lot of "' ", """ which seem to distort the model and
decoder.
E.g. truecasing did not work properly with those special symbols:

" ( Das sind sehr ||| ' ( these are very ||| 0.5 2.47962e-05
0.333333 7.4064e-05 ||| 0-0 1-1 2-2 3-3 4-4 ||| 2 3 1 ||| |||

Did you do any additional purification of the corpus before training?
Please share your experience.

Artem Shevchenko
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to