Hello, following the manual for baseline creaition, I have trained the model using Europarl v9 de-en pair. Now I observe that obtained phrase table contains a lot of noise.
E.g. a lot of "' ", """ which seem to distort the model and decoder. E.g. truecasing did not work properly with those special symbols: " ( Das sind sehr ||| ' ( these are very ||| 0.5 2.47962e-05 0.333333 7.4064e-05 ||| 0-0 1-1 2-2 3-3 4-4 ||| 2 3 1 ||| ||| Did you do any additional purification of the corpus before training? Please share your experience. Artem Shevchenko
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support