Log-linear weight optimisation via Bayesian adaptation in statistical machine translation. Germán Sanchis-Trilles, Francisco Casacuberta.
CoLing 2010. http://www.aclweb.org/anthology/C/C10/C10-2124.pdf

I published some stability results for MERT. In addition, there is also this paper

Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability. J.H. Clark, C. Dyer, A. Lavie, N.A. Smith. ACL-HLT 2011. www.aclweb.org/anthology/P11-2031.pdf

where an analysis of different optimizer strategies is performed, in terms of stability.



There is a work by Marco Turchi, where they look at evolution of BLEU with 
increasing data set size used for MERT. The investigation is primarily for
Spanish-English language pair, so the inferences might not be scalable when 
for a challenging language pair. 

The draft of the paper was titled "Learning to Translate: a statistical and
computational analysis", but I believe he has published this last year in AAAI 
results you are interested in have been listed in the Appendix of that draft). 

      Does anyone know of any published results which invesitage the effect of
      the size of the tuning data set. I'm primarily interested in relation to
      Mert, but other optimization methods would also be interesting,

