Re: [Moses-support] BLEU score difference about 0.13 for one dataset is normal?
Thanks Michael for the paper and thanks Tom. Based on the paper, one solution is replication of MERT and testing at least three times. My ideas have subtle effects on BLUE. Do you recommend me run MERT and testing three times or more? should i increase the number of sentences for tuning? my dataset for Persian to English includes: Training: about 24 sentences Tune: 1000 sentences Test: 1000 sentences From: tah...@precisiontranslationtools.com Date: Sun, 11 Oct 2015 12:53:37 +0700 To: moses-support@mit.edu Subject: Re: [Moses-support] BLEU score difference about 0.13 for one dataset is normal? Yes. Each tuning with the same test set will give you small variations in the final BLEU. Yours looks like they're in a normal range. Date: Sun, 11 Oct 2015 04:23:56 + From: Davood Mohammadifar Subject: [Moses-support] BLEU score difference about 0.13 for one dataset is normal? To: Moses Support Hello every one I noticed different BLEU scores for same dataset. Also the difference is not so much and is about 0.13. I trained my dataset and tuned development set for Persian-English translation. after testing, the score was 21.95. For second time i did the same process and obtained 21.82. (my tools were mgiza, mert, ...) is this difference normal? My system: CPU: Core i7-4790K RAM: 16GB OS: ubuntu 12.04 Thanks ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Segmentation Fault during Tuning
Hi, with this modification it works Thanks a lot Alex El 12 oct 2015 a las 09:09, Philipp Koehn escribió: Hi, in t2, you do generate an output lemma factor - which may be the cause of this problem (even though you do not seem to use the output lemma anywhere else). Does it still core dump, if you change translation factors to: translation-factors = "lemma -> lemma, pos -> pos, word -> word + lemma + pos" -phi On Sat, Oct 10, 2015 at 9:52 AM, Alex Martinez wrote: Hello, I'm trying to build a factored system using EMS based on this example from the tutorial: - % train-model.perl \ --corpus factored-corpus/proj-syndicate.1000 \ --root-dir morphgen-backoff \ --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 2:3:factored-corpus/pos.lm:0 \ --translation-factors 1-1+3-2+0-0,2 \ --generation-factors 1-2+1,2-0 \ --decoding-steps t0,g0,t1,g1:t2 \ --external-bin-dir .../tools -- I'm getting a segmentation fault during tuning and I have the feeling that the problem is related to the line defining the decoding-steps. What I have on my EMS config file to get a similar model is: ### factored training: specify here which factors used # if none specified, single factor training is assumed # (one translation step, surface to surface) # input-factors = word lemma pos output-factors = word lemma pos alignment-factors = "word+lemma -> word+lemma" translation-factors = "lemma -> lemma, pos -> pos, word -> word + pos" reordering-factors = "word -> word" generation-factors = "lemma -> pos, lemma+pos -> word" decoding-steps = "t0,g0,t1,g1:t2" generation-type = single prune-generation = "$moses-bin-dir/pruneGeneration 100" - The training fails in the tuning step and I'm getting this in the TUNING_tune.1.STDERR: Executing: /opt/moses/bin/moses -threads all -v 0 -config /mnt/a62/devel/en_es/processfin/model/moses.bin.ini.1 -weight-overwrite 'WordPenalty0= -0.128205 TranslationModel0= 0.025641 0.025641 0.025641 0.025641 LM2= 0.064103 LM0= 0.064103 GenerationModel1= 0.038462 0.00 TranslationModel2= 0.025641 0.025641 0.025641 0.025641 GenerationModel0= 0.038462 PhrasePenalty0= 0.025641 Distortion0= 0.038462 TranslationModel1= 0.025641 0.025641 0.025641 0.025641 LexicalReordering0= 0.038462 0.038462 0.038462 0.038462 0.038462 0.038462 LM1= 0.064103' -n-best-list run1.best100.out 100 distinct -input-file /mnt/a62/devel/en_es/data/corpora.tuning.en > run1.out Segmentation fault (core dumped) Exit code: 139 The decoder died. CONFIG WAS -weight-overwrite 'WordPenalty0= -0.128205 TranslationModel0= 0.025641 0.025641 0.025641 0.025641 LM2= 0.064103 LM0= 0.064103 GenerationModel1= 0.038462 0.00 TranslationModel2= 0.025641 0.025641 0.025641 0.025641 GenerationModel0= 0.038462 PhrasePenalty0= 0.025641 Distortion0= 0.038462 TranslationModel1= 0.025641 0.025641 0.025641 0.025641 LexicalReordering0= 0.038462 0.038462 0.038462 0.038462 0.038462 0.038462 LM1= 0.064103' cp: cannot stat ‘/mnt/a62/devel/en_es/processfin/tuning/tmp.1/moses.ini’: No such file or directory --- If I change this line in the config file from decoding-steps = "t0,g0,t1,g1:t2" to decoding-steps = "t0,g0,t1,g1" then the training ends without errors. I'll appreciate suggestions on how to solve that. Alex ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Compact lex reordering table on OSX/clang
On 10/13/2015 06:05 PM, Marcin Junczys-Dowmunt wrote: > yes, definitely wrong turn, all code should be in CompactPT. Ah, I'd missed that this is a template. So what's being loaded there is an array of float. Not that much that can go wrong there, within one CPU architecture... My next stab in the dark would be the MmapAllocator... which seems to be where the error happens. But now we're into higher magic, as far as I'm concerned. Could there be a fatal difference in how std::vector interacts with the allocator? Jeroen ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Compact lex reordering table on OSX/clang
i'll take a closer look when I have time. I think it's been happening for a while but I've ignored it. btw, i've pulled unblockpt into master Hieu Hoang http://www.hoang.co.uk/hieu On 13 October 2015 at 12:05, Marcin Junczys-Dowmunt wrote: > Hi, > > yes, definitely wrong turn, all code should be in CompactPT. > I am not sure this is actually a code bug, is it working with g++ on macOS? > > W dniu 2015-10-13 12:50, Jeroen Vermeulen napisał(a): > > On 10/13/2015 04:59 PM, Hieu Hoang wrote: > > you're quite right, i've added a check > https://github.com/moses-smt/mosesdecoder/commit/982d52e5b657f4c1fa7369e577cfd75a8af16543 > However, that the the problem I'm having on OSX. It opens but it crashes on > loading. I suspect one of the datatypes has slightly different size on > clang/OSX compared to gcc/Linux > > Before the loading gets to this point, CanonicalHuffman.Load() does > something that intrigues me, as a reader who doesn't really grok the > code: it fread()s an array of Data. > > If Data is the class I find in mert/Data.h, then AFAICT the compiler > would be well within its rights to break this. Not only is it not a > POD, it contains pointers, including in strings and vectors. You > wouldn't expect that to work. Did I take a wrong turn somewhere? > > > Jeroen > ___ > Moses-support mailing > listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support > > > > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] decoding-graph-backoff
Sir, As you mentioned adding "-tt" while running decoder for input gives us feature score for each pair, also a non-zero value for a translation model (given that there are multiple phrase-table) gives us indication from which phrase-table translation candidate was used. Sir I'm making a system for Hindi to Urdu , and after running decoder with "-tt" im getting (small sample): |lm=(2:-8.9735,1:-17.0476,2:-0.639163,2:-6.23737,3:-4.51238,3:-1.19594,3:-6.55882,3:-3.23557,3:-2.26149,2:-14.4246,2:-2.78677,2:-7.39963,3:-1.62681,2:-13.4158,1:-13.6408,2:-1.33196,3:-1.25049)| بغیر |0-0,wa=0-0 ,total=-0.28041, LexicalReordering0= -0.0316486 0 0 0 0 0 Distortion0= 0 LM0= -8.9735 WordPenalty0= -1 PhrasePenalty0= 1 TranslationModel0= -0.167537 -0.313897 -0.239363 -0.242946 TranslationModel1= 0 0 0 0| ابلی |1-1,wa=0-0 ,total=-0.463503, LexicalReordering0= -0.510826 0 0 -0.0511291 0 0 Distortion0= 0 LM0= -17.0476 WordPenalty0= -1 PhrasePenalty0= 1 *TranslationModel0= 0 0 0 0 TranslationModel1= 0 0 0 0*| Red colored TranslationModel0 feature tells us that translation model 0 was used to get translated phrase in red color but sir blue color one says that none of the two translation models were used? Is my assumption correct? also phrase was translated (blue font) can you please tell why this happened? Regards, Saumitra Yadav Intern, LTRC IIIT-Hyderabad On Thu, Jul 30, 2015 at 7:13 PM, Philipp Koehn wrote: > Hi, > > yes, that is correct. If there are non-zero valued scores listed with a > translation model feature, then this translation model was used for the > phrase pair. > > -phi > > On Wed, Jul 29, 2015 at 7:57 PM, Saumitra Yadav < > yadav.saumitr...@gmail.com> wrote: > >> Sir, >> Thank you for that option , it really helped. I just wanted to know if >> I'm analysing it correctly >> For initial analysis m just finding how many times which phrase tables >> were called , so in attached file (formatted just for easy readability ) >> please find the output of one sentence , is it correct to say that >> TranslationModel0 was used 2 times and TranslationModel1 was used 5 times >> for given input? >> >> Regards, >> Saumitra Yadav >> M.Tech. >> Department Of Computer Science And Technology >> Goa University >> >> >> On Wed, Jul 29, 2015 at 9:22 PM, Philipp Koehn wrote: >> >>> Hi, >>> >>> when you call the decoder with the option "-tt" then you get for >>> each phrase pairs a list of all feature scores. You can use this >>> to track down which phrase table was used for each phrase >>> translation. >>> >>> -phi >>> >>> On Wed, Jul 29, 2015 at 10:59 AM, Hieu Hoang >>> wrote: >>> good question. no. You can try & write it yourself. in the TargetPhrase class, there is a method GetContainer() which points to the phrase-table that a particular rule comes from. You can use this. On 29/07/2015 18:51, Saumitra Yadav wrote: Sir, Is there a command or argument which can tell, which phrase in output is taken from which phrase-table (incase we have multiple phrase-tables )? Regards, Saumitra Yadav M.Tech. Department Of Computer Science And Technology Goa University On Sun, Jul 26, 2015 at 11:49 AM, Hieu Hoang wrote: > since you have 3 phrase-tables, you may have to have 3 entries in the > [decoding-graph-backoff] section, eg > [decoding-graph-backoff] > 0 > 3 > 3 > > > Hieu Hoang > Researcher > New York University, Abu Dhabi > http://www.hoang.co.uk/hieu > > On 25 July 2015 at 20:23, Saumitra Yadav < > yadav.saumitr...@gmail.com> wrote: > >> Sir, >> Please find attached , moses.ini file i used and command used >> was ~/Decoder/mosesdecoder/bin/moses -f moses.ini >> >> Regards, >> Saumitra Yadav >> M.Tech. >> Department Of Computer Science And Technology >> Goa University >> >> >> On Sat, Jul 25, 2015 at 9:21 PM, Hieu Hoang >> wrote: >> >>> can you please send me the moses.ini file that you used, that cause >>> the segfault. And send me the exact command you typed >>> >>> >>> On 24/07/2015 14:40, Saumitra Yadav wrote: >>> >>> But sir when i did that there was * segmentation fault* while >>> loading first phrase-table, one walk around i got was giving >>> phrase-table >>> uncompressed to decoder. >>> >>> Regards, >>> Saumitra Yadav >>> M.Tech. >>> Department Of Computer Science And Technology >>> Goa University >>> >>> >>> On Thu, Jul 23, 2015 at 8:06 PM, Hieu Hoang < >>> hieuho...@gmail.com> wrote: >>> i think you have to swap the phrase tables around. The decoder always looks at the 1st phrase-table, then backoff to the 2nd if nothing is found On 22/07/2015 16:59, Saumitra Yadav wrote: Sir/Ma'am, I'm trying to use mu
Re: [Moses-support] Compact lex reordering table on OSX/clang
Hi, yes, definitely wrong turn, all code should be in CompactPT. I am not sure this is actually a code bug, is it working with g++ on macOS? W dniu 2015-10-13 12:50, Jeroen Vermeulen napisał(a): > On 10/13/2015 04:59 PM, Hieu Hoang wrote: > >> you're quite right, i've added a check >> https://github.com/moses-smt/mosesdecoder/commit/982d52e5b657f4c1fa7369e577cfd75a8af16543 >> [1] However, that the the problem I'm having on OSX. It opens but it >> crashes on loading. I suspect one of the datatypes has slightly different >> size on clang/OSX compared to gcc/Linux > > Before the loading gets to this point, CanonicalHuffman.Load() does > something that intrigues me, as a reader who doesn't really grok the > code: it fread()s an array of Data. > > If Data is the class I find in mert/Data.h, then AFAICT the compiler > would be well within its rights to break this. Not only is it not a > POD, it contains pointers, including in strings and vectors. You > wouldn't expect that to work. Did I take a wrong turn somewhere? > > Jeroen > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support [2] Links: -- [1] https://github.com/moses-smt/mosesdecoder/commit/982d52e5b657f4c1fa7369e577cfd75a8af16543 [2] http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Compact lex reordering table on OSX/clang
On 10/13/2015 04:59 PM, Hieu Hoang wrote: > you're quite right, i've added a check > > https://github.com/moses-smt/mosesdecoder/commit/982d52e5b657f4c1fa7369e577cfd75a8af16543 > However, that the the problem I'm having on OSX. It opens but it crashes > on loading. > > I suspect one of the datatypes has slightly different size on clang/OSX > compared to gcc/Linux Before the loading gets to this point, CanonicalHuffman.Load() does something that intrigues me, as a reader who doesn't really grok the code: it fread()s an array of Data. If Data is the class I find in mert/Data.h, then AFAICT the compiler would be well within its rights to break this. Not only is it not a POD, it contains pointers, including in strings and vectors. You wouldn't expect that to work. Did I take a wrong turn somewhere? Jeroen ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Compact lex reordering table on OSX/clang
you're quite right, i've added a check https://github.com/moses-smt/mosesdecoder/commit/982d52e5b657f4c1fa7369e577cfd75a8af16543 However, that the the problem I'm having on OSX. It opens but it crashes on loading. I suspect one of the datatypes has slightly different size on clang/OSX compared to gcc/Linux Hieu Hoang http://www.hoang.co.uk/hieu On 13 October 2015 at 07:03, Jeroen Vermeulen < j...@precisiontranslationtools.com> wrote: > On 10/12/2015 11:15 PM, Hieu Hoang wrote: > > I'm not sure if anyone else encounters it but the compact lexical > > reordering table crashes for me on OSX/clang during loading. > > > > The stack trace i have for this is > > LexicalReorderingTableCompact::LexicalReorderingTableCompact > >LexicalReorderingTableCompact::Load line 180 > > StringVector::load line 2808 > > StringVector::loadCharArray line 247 > > Could the file simply not be open? It's opened in > LexicalReorderingTableCompact::Load, but as far as I can see, *nothing* > ever checks that this actually works. Code just keeps reading from the > file and assuming success, and using the possibly invalid result. Maybe > this just happens to be the first point where it causes a crash. > > > Jeroen > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support