Re: [Moses-support] Use high-quality corpus for training or turning?

Philipp Koehn Wed, 24 Jun 2015 13:14:12 -0700

Hi,

it is beneficial if the tuning set
- is representative of what you want to translate
- is a relatively literal translation, so the MT system has a chance
to match the reference


-phi

On Wed, Jun 24, 2015 at 12:52 PM, Dingyuan Wang <abcdoyle...@gmail.com> wrote:
> Dear all,
>
> I have collected a lot of parallel texts. A large number of them are from
> web pages and aligned by rules and algorithms, some of which lacks many
> sentences on one side (5:1), so the auto alignment contains lots of errors.
> Some of them are well aligned per paragraph. A few of them are mostly single
> pieces of articles which are aligned by hand or already aligned.
> Since the amount of data is not so great (less than a hundred MB), I must
> use it efficiently.
> At all cases I would manually check the test set line by line.
> Should I prefer the high-quality data for turning, and why?
> (I am actually seeking a explanation to convince myself to do so.)
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Use high-quality corpus for training or turning?

Reply via email to