[Moses-support] hypergraph decoding with the decoder
Hi all, I'm wondering how to decode hypergraph using the -search-algorithm 5 feature in the moses decoder? What format should the hypergraph be written in? (Is it the same as what https://github.com/kpu/lazy requires?) What format of the language model does it support? Thanks, Angli ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] lattice mbr output empty translation result
Hi, I was using lattice mbr to decode the source sentences; the model was tuned using mert. However, despite the fact that other decoding methods such as maximum probability decoding and consensus decoding can output results without a problem, mbr decoding using the -lmbr flag let the decoder output an empty file, whatever size, scale and pruning factor I set. In its simplest form, the code that caused this problem is essentially equivalent to the following: moses \ -f moses.ini \ -output-unknowns file1 \ -n-best-list file2 50 \ -output-search-graph file3 \ -lmbr \ (-lmbr-p 0.8 -lmbr-r 0.8 -mbr-scale 5 -lmbr-pruning-factor 50) \ < in_file \ > out_file 1. parameters in parentheses are optional, though either way nothing was output by the decoder. 2. the problem essentially is that it is out_file that tuned out to be empty. What was the problem? Thanks for your input in advance! -Angli ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] nscores in phrase table binarization
Hi Moses community, What is -nscores used for as a parameter of mosesdecoder/bin/processPhraseTableMin ? (In the baseline system at http://www.statmt.org/moses/?n=Moses.Baseline, this parameter was set to 4. ) Thanks! Angli ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] How to print out intermediate confusion networks / lattices?
I see! Does this mean that the default decoding algorithm and MBR/consensus decoding are all trying to rerank the n-best list extracted from the search graph (in different ways)? If so, does it make sense at all to try to develop a search method to directly extract the best path from the search graph, i.e., the lattice? Thanks, Angli On Fri, Mar 24, 2017 at 8:54 AM, Philipp Koehn wrote: > Hi, > > the search graph does not include the MBR objective, since that is > computed afterwards, on top of the n-best list extracted from the search > graph. > > You can mix cube pruning and MBR together. As mentioned above, the > "decision rule" (MBR vs. max-prob) is applied after search is finished. > > -phi > > On Fri, Mar 24, 2017 at 11:50 AM, Angli Liu > wrote: > >> Thanks! >> >> Furthermore, does "output-search-graph" output the search graph only when >> the default objective (posterior probability) is used, or also when minimum >> bayes risk decoding / consensus decoding is used (smoothed BLEU)? >> >> Also, is cube pruning applicable to minimum bayes risk decoding or >> consensus decoding? Namely, should I turn on -search-algorithm 1 when -lmbr >> or -con is on? >> >> Thanks, >> Angli >> >> On Fri, Mar 24, 2017 at 8:00 AM, Philipp Koehn wrote: >> >>> Hi, >>> >>> the option to output the search graph is called "output-search-graph" >>> >>> See http://www.statmt.org/moses/?n=Advanced.Search for details. >>> >>> The source code is in $MOSES/moses-cmd and $MOSES/moses >>> >>> -phi >>> >>> >>> >>> On Thu, Mar 23, 2017 at 6:30 PM, Angli Liu >>> wrote: >>> >>>> Hi Moses community, >>>> >>>> In decoding, is it possible to have Moses output a confusion network >>>> (CN) or a word lattice (WL), instead of the decoded text for each sentence? >>>> I'm aware that one parameter of the decoder is "-inputtype", so the >>>> question is what parameter of the decoder should be used to determine the >>>> output type (among CN, WL and plain texts)? >>>> >>>> Also, where can I exactly find the decoder code (responsible for what >>>> the binary "moses" does) inside https://github.com/moses-smt/m >>>> osesdecoder? >>>> >>>> Thanks, >>>> Angli >>>> >>>> ___ >>>> Moses-support mailing list >>>> Moses-support@mit.edu >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> >>> >> > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] How to print out intermediate confusion networks / lattices?
Thanks! Furthermore, does "output-search-graph" output the search graph only when the default objective (posterior probability) is used, or also when minimum bayes risk decoding / consensus decoding is used (smoothed BLEU)? Also, is cube pruning applicable to minimum bayes risk decoding or consensus decoding? Namely, should I turn on -search-algorithm 1 when -lmbr or -con is on? Thanks, Angli On Fri, Mar 24, 2017 at 8:00 AM, Philipp Koehn wrote: > Hi, > > the option to output the search graph is called "output-search-graph" > > See http://www.statmt.org/moses/?n=Advanced.Search for details. > > The source code is in $MOSES/moses-cmd and $MOSES/moses > > -phi > > > > On Thu, Mar 23, 2017 at 6:30 PM, Angli Liu > wrote: > >> Hi Moses community, >> >> In decoding, is it possible to have Moses output a confusion network (CN) >> or a word lattice (WL), instead of the decoded text for each sentence? I'm >> aware that one parameter of the decoder is "-inputtype", so the question is >> what parameter of the decoder should be used to determine the output type >> (among CN, WL and plain texts)? >> >> Also, where can I exactly find the decoder code (responsible for what the >> binary "moses" does) inside https://github.com/moses-smt/mosesdecoder? >> >> Thanks, >> Angli >> >> ___ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] How to print out intermediate confusion networks / lattices?
Hi Moses community, In decoding, is it possible to have Moses output a confusion network (CN) or a word lattice (WL), instead of the decoded text for each sentence? I'm aware that one parameter of the decoder is "-inputtype", so the question is what parameter of the decoder should be used to determine the output type (among CN, WL and plain texts)? Also, where can I exactly find the decoder code (responsible for what the binary "moses" does) inside https://github.com/moses-smt/mosesdecoder? Thanks, Angli ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] BLEU score decoding word lattice
Hi all, Is there a way to do lattice decoding with BLEU in Moses? I.e., given a word lattice, find the path that represents the highest BLEU score? If so, what function to call and in what format should I feed a lattice in? Thanks! Angli ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Tuning for factored phrase based systems
Thank you! On Tue, Dec 6, 2016 at 12:55 AM Sašo Kuntaric wrote: > Hi Angli, > > Here is an excerpt of Hieu's answers regarding this topic when I was doing > research in factored models, might be of some help: > > On 30/06/2016 21:44, Sašo Kuntaric wrote: > > Hi all, > > I would like to ask one more question. When you say that my reference only > has the surface form, are you talking about the "tuning corpus", which in > the case of my command > > ~/mosesdecoder/scripts/training/mert-moses.pl > ~/working/IT_corpus/TMX/txt/factored_corpus/singles/tuning_corpus.tagged.clean.en > ~/working/IT_corpus/TMX/txt/factored_corpus/singles/ > tuning_corpus.tagged.clean.sl ~/mosesdecoder/bin/moses > ~/working/IT_corpus/TMX/txt/factored_corpus/singles/test/model/moses.ini > --mertdir ~/mosesdecoder/bin/ --decoder-flags="-threads all" > > are tuning_corpus.tagged.clean.en and tuning_corpus.tagged.clean.sl? Can > tuning be done with files that only contains surface forms? > > it's usual that the reference tuning data does not have factors, even if > there are factors in the phrase table. After all, you don't care if the > output surface form is correct but the other factors are wrong. > > Will the results be compatible with tuning done with a factored tuning > corpus? > > yes > > Best regards, > > Sašo > > 2016-12-04 1:37 GMT+01:00 Hieu Hoang : > > > > Hieu > Sent while bumping into things > > On 1 Dec 2016 07:01, "Angli Liu" wrote: > > Hi, what's the major difference between the tuning process for a factored > phrase based system (i.e., surface+pos data) and a simple baseline phrase > based system? > > > Nothing, the tuning just optimise weights for feature functions. > > If you decompose your translation so that it has multiple phrase tables > and generation models, then they are just extra feature functions with > weights to be tuned > > Do I need to organize the dev set the same way as the training set (i.e., > surface|pos)? > > Yes > > Is there a tutorial on the moses website on this topic? > > Maybe this > http://www.statmt.org/moses/?n=FactoredTraining.FactoredTraining > > > Thanks! > > -Angli > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > -- > lp, > > Sašo > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Tuning for factored phrase based systems
Hi, what's the major difference between the tuning process for a factored phrase based system (i.e., surface+pos data) and a simple baseline phrase based system? Do I need to organize the dev set the same way as the training set (i.e., surface|pos)? Is there a tutorial on the moses website on this topic? Thanks! -Angli ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] BLEU score on dev set doesn't match what's reported in moses.ini
Hi - I trained a phrase based system from a low resource language to english, and got *13.6633* as the BLEU score. However, when I tested on the same dev set and computed BLEU against the English corpus in the dev set, I only got *3.69*. Then I did a manual grid search over the parameter space in moses.ini (the one that's generated upon the end of tuning/development), and got the BLEU of *3.77* at best. Both recasing and tokenization are used to the dev set I computed BLEU on. I'm wondering what could be the potential reason why the BLEU score reported in moses.ini derived from the dev set doesn't align with the one I computed with the same dev set. Thanks. - Angli ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support