Hi James, Irrespective of the fact that you need to tune the weights of the log-linear model:
Let me provide more references in order to shed light on how well established simple pruning techniques are in our field as well as in related fields (namely, automatic speech recognition). This list of references might not be what you are looking for, but maybe other readers can benefit. V. Steinbiss, B. Tran, H. Ney. Improvements in beam search. In Proc. of the Int. Conf. on Spoken Language Processing (ICSLP’94), pages 2143-2146, Yokohama, Japan, Sept. 1994. http://www.steinbiss.de/vst94d.pdf R. Zens, F. J. Och, and H. Ney. Phrase-Based Statistical Machine Translation. In German Conf. on Artificial Intelligence (KI), pages 18-32, Aachen, Germany, Sept. 2002. https://www-i6.informatik.rwth-aachen.de/publications/download/434/Zens-KI-2002.pdf Philipp Koehn. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In Proc. of the AMTA, pages 115-124, Washington, DC, USA, Sept./Oct. 2004. http://homepages.inf.ed.ac.uk/pkoehn/publications/pharaoh-amta2004.pdf Robert C. Moore and Chris Quirk. Faster Beam-Search Decoding for Phrasal Statistical Machine Translation. In Proc. of MT Summit XI, European Association for Machine Translation, Sept. 2007. http://research.microsoft.com/pubs/68097/mtsummit2007_beamsearch.pdf Richard Zens and Hermann Ney. Improvements in Dynamic Programming Beam Search for Phrase-based Statistical Machine Translation. In Proc. of the International Workshop on Spoken Language Translation (IWSLT), Honolulu, HI, USA, Oct. 2008. http://www.mt-archive.info/05/IWSLT-2008-Zens.pdf Cheers, Matthias On Wed, 2015-06-24 at 13:11 +0000, Read, James C wrote: > Thank you for reading very careful the draft paper I provided a link > to and noticing that the Johnson paper is duly cited there. Given that > you had already noticed this I shall not proceed to explain the > blinding obvious differences between my very simple filter and their > filter based on Fisher's exact test. > > Other than that it seems painfully clear that the point I meant to > make has not been understood entirely. If the default behaviour > produces BLEU scores considerably lower than merely selecting the most > likely translation of each phrase then evidently there is something > very wrong with the default behaviour. If we cannot agree on something > as obvious as that then I really can't see this discussion making any > productive progress. > > James > > ________________________________________ > From: moses-support-boun...@mit.edu <moses-support-boun...@mit.edu> on behalf > of Rico Sennrich <rico.sennr...@gmx.ch> > Sent: Friday, June 19, 2015 8:25 PM > To: moses-support@mit.edu > Subject: Re: [Moses-support] Major bug found in Moses > > [sorry for the garbled message before] > > you are right. The idea is pretty obvious. It roughly corresponds to > 'Histogram pruning' in this paper: > > Zens, R., Stanton, D., Xu, P. (2012). A Systematic Comparison of Phrase > Table Pruning Technique. In Proceedings of the 2012 Joint Conference on > Empirical Methods in Natural Language Processing and Computational > Natural Language Learning (EMNLP-CoNLL), pp. 972-983. > > The idea has been described in the literature before that (for instance, > Johnson et al. (2007) only use the top 30 phrase pairs per source > phrase), and may have been used in practice for even longer. If you read > the paper above, you will find that histogram pruning does not improve > translation quality on a state-of-the-art SMT system, and performs > poorly compared to more advanced pruning techniques. > > On 19.06.2015 17:49, Read, James C. wrote: > > So, all I did was filter out the less likely phrase pairs and the BLEU > > score shot up. Was that such a stroke of genius? Was that not blindingly > > obvious? > > > > > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support