Hi, it is beneficial if the tuning set - is representative of what you want to translate - is a relatively literal translation, so the MT system has a chance to match the reference
-phi On Wed, Jun 24, 2015 at 12:52 PM, Dingyuan Wang <abcdoyle...@gmail.com> wrote: > Dear all, > > I have collected a lot of parallel texts. A large number of them are from > web pages and aligned by rules and algorithms, some of which lacks many > sentences on one side (5:1), so the auto alignment contains lots of errors. > Some of them are well aligned per paragraph. A few of them are mostly single > pieces of articles which are aligned by hand or already aligned. > Since the amount of data is not so great (less than a hundred MB), I must > use it efficiently. > At all cases I would manually check the test set line by line. > Should I prefer the high-quality data for turning, and why? > (I am actually seeking a explanation to convince myself to do so.) > > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support