Ok thanks. Appreciate your help. On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt <junc...@amu.edu.pl> wrote:
> Difficult to tell with that little data. Once you get beyond 100,000 > segments (or 50,000 at least) i would say 2000 per dev (for tuning) and > test set, rest for training. With that few segments it's hard to give you > any recommendations since it might just not give meaningful results. It's > currently a toy model, good for learning and playing around with options. > But not good for trying to infer anything from BLEU scores. > > > On 22.06.2015 10:17, Hokage Sama wrote: > >> Yes the language model was built earlier when I first went through the >> manual to build a French-English baseline system. So I just reused it for >> my Samoan-English system. >> Yes for all three runs I used the same training and testing files. >> How can I determine how much parallel data I should set aside for tuning >> and testing? I have only 10,028 segments (198,385 words) altogether. At the >> moment I'm using 259 segments for testing and the rest for training. >> >> Thanks, >> Hilton >> >> >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support