Philipp (and others),

I'm wondering what people's experience is regarding when truecasing is
applied.

One option is to truecase the training data, then train your TM and LM
using that truecased data. Another option would be to lowercase the data,
train TM and LM on the lowercased data, and then perform truecasing after
decoding.

I assume that the former gives better results, but the latter approach has
an advantage in terms of extensibility (namely if you get more data and
update your truecase model, you don't have to re-train all of your TMs and
LMs).

Does anyone have any insights they would care to share on this?

Thanks,
Lane
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to