Re: [Moses-support] BLEU Score Variance: Which score to use?

Marcin Junczys-Dowmunt Mon, 22 Jun 2015 01:25:14 -0700

Difficult to tell with that little data. Once you get beyond 100,000 
segments (or 50,000 at least) i would say 2000 per dev (for tuning) and 
test set, rest for training. With that few segments it's hard to give 
you any recommendations since it might just not give meaningful results. 
It's currently a toy model, good for learning and playing around with 
options. But not good for trying to infer anything from BLEU scores.


On 22.06.2015 10:17, Hokage Sama wrote:
> Yes the language model was built earlier when I first went through the 
> manual to build a French-English baseline system. So I just reused it 
> for my Samoan-English system.
> Yes for all three runs I used the same training and testing files.
> How can I determine how much parallel data I should set aside for 
> tuning and testing? I have only 10,028 segments (198,385 words) 
> altogether. At the moment I'm using 259 segments for testing and the 
> rest for training.
>
> Thanks,
> Hilton
>

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] BLEU Score Variance: Which score to use?

Reply via email to