Difficult to tell with that little data. Once you get beyond 100,000 segments (or 50,000 at least) i would say 2000 per dev (for tuning) and test set, rest for training. With that few segments it's hard to give you any recommendations since it might just not give meaningful results. It's currently a toy model, good for learning and playing around with options. But not good for trying to infer anything from BLEU scores.
On 22.06.2015 10:17, Hokage Sama wrote: > Yes the language model was built earlier when I first went through the > manual to build a French-English baseline system. So I just reused it > for my Samoan-English system. > Yes for all three runs I used the same training and testing files. > How can I determine how much parallel data I should set aside for > tuning and testing? I have only 10,028 segments (198,385 words) > altogether. At the moment I'm using 259 segments for testing and the > rest for training. > > Thanks, > Hilton > _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support