We did some experiments a long time ago on tuning set size (for Chinese to 
English).  For the standard Moses setup, there are only a dozen or so 
meta-features to find weights for, so it's no surprise that improvements 
asymptote sharply after the tuning set gets much bigger than 1-2000 segment 
pairs.  (To answer one of your questions, Per, the size of the tuning set 
shouldn't have much, if anything, to do with the size of the phrase training 
dataset.)

Of course tuning algorithms like MIRA let you efficiently work with many more 
meta-features - see Chiang et al. 2009:

  http://www.aclweb.org/anthology/N/N09/N09-1025.pdf

In this case you'd expect to continue finding improvements with much larger 
tuning sets.

- John Burger
  MITRE

On Mar 15, 2013, at 11:07 , Barry Haddow wrote:

> Hi Per
> 
> We typically use tuning sets of 1000-3000 sentences, but recently have 
> been experimenting with larger sets (10k) which can give slightly better 
> results. It all depends if you care about that last 0.2 bleu. I don't 
> think there's been any thorough investigation into tuning set size, or 
> its relation with training set size.
> 
> batch-mira works well, sometimes better than mert, but not quicker. The 
> only reading is the Cherry and Foster paper, which contains a good 
> overview of tuning methods.
> 
> I should also mention this presentation on discriminative training
> http://www.statmt.org/mtm12/pub/discriminative-mt.pdf
> 
> cheers - Barry
> 
> On 15/03/13 12:10, Per Tunedal wrote:
>> Hi Barry,
>> I've already looked at that page, but it didn't answer my questions.
>> 
>> The most pertinent questions are practical:
>> What's the recommended size of the tuning corpus?
>> Is that size independent of the size of the training corpus, or not?
>> 
>> But, I'm interested in the theoretical aspects as well.
>> 
>> I've looked into the mert-moses.pl script:
>> maximum-iterations=i : could be a short cut if I don't want to wait for
>> ever. Any advice on a wise limit for the iterations?
>> threads=i : sounds useful. But you say that I probably wont need it.
>> Why?
>> 
>> Any experience of batch-mira? pros and cons? Any reading?
>> 
>> Yours,
>> Per Tunedal
>> 
>> On Fri, Mar 15, 2013, at 10:50, Barry Haddow wrote:
>>> Hi Per
>>> 
>>> There's a lot of questions in this email. I'd strongly recommend that
>>> you have a look at this page
>>> http://www.statmt.org/moses/?n=FactoredTraining.Tuning and the
>>> references in it. But if you really want to understand tuning you need
>>> to read this book (http://www.statmt.org/book/) and particularly chapter
>>> 9.
>>> 
>>> As to the memory/thread usage, Moses will use a single thread whilst
>>> loading models, then multiple threads in decoding. The mert binary
>>> (mert) shouldn't be resource heavy in the default setting. It has its
>>> own threads parameter, but you probably don't need it.
>>> 
>>> Tuning stops when it no longer gets any improvement, typically 10-20
>>> iterations, although there is an upper limit of 25 (configurable).
>>> 
>>> cheers - Barry
>>> 
>>> On 15/03/13 08:08, Per Tunedal wrote:
>>>> Hi again,
>>>> What does the tuning actually do? Tries to translate and checks against
>>>> the actual translation in the target language file? Trying different
>>>> weights, over and over again? No wonder it's time consuming.
>>>> 
>>>> Tuning needs a lot of memory too, compared to training. At least in one
>>>> of the steps, according to the system monitor.  The step that only uses
>>>> one thread, in spite of the parameter -threads. What step? And why?
>>>> 
>>>> I see some interesting files are created, with names like
>>>> run8.best100.out . I suppose those are the most successful translations.
>>>> How are they used in the tuning?
>>>> 
>>>> The default tuner (?) is mert, how does mert acually work to do the
>>>> tuning efficient?  How are the weights to be tested chosen? Are there
>>>> any short cuts to take?
>>>> What's the difference to other tuners (?)?
>>>> 
>>>> Anyone working on some different approach for tuning, to get improved
>>>> tuning speed or improved translation quality?
>>>> 
>>>> What's the recommended size of the tuning corpus? Is that size
>>>> independent of the size of the training corpus? Is it dependent of the
>>>> tuner (?) used?
>>>> 
>>>> Yours,
>>>> Per Tunedal
>>>> 
>>>> PS My tuning has just started round 8, after 20 hours of processing.
>>>> Will it stop at 10 rounds, or what?
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> 
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> 
> 
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to