Re: [Moses-support] Number of Unique Hypotheses in the N-best List
Hello again, On Tue, Feb 24, 2015 at 10:18 PM, Rico Sennrich wrote: did you actually cut away the scores? It's possible that you have duplicates with different scores, so they will show up as different lines with 'sort | uniq', but will be merged if you do 'cut -d'|' -f4 | sort | uniq' as Matthias suggested. Yes, the numbers I reported was on pure text; there were no scores. I used awk to cut the scores instead of cut, which basically produces the same result. On Tue, Feb 24, 2015 at 10:03 PM, Matthias Huck mh...@inf.ed.ac.uk wrote: Also note that n-best-factor takes effect only if distinct is active. I tried that before reading your reply, and I confirm this on Moses v3.0 Please try to investigate what's going on (if you have the time). On Tue, Feb 24, 2015 at 10:38 PM, Hieu Hoang wrote: the decoding may have changed but the decoding algorithms should be exactly the same. The scores should be exactly the same (apart from rounding differences and OOV words, which shouldn't affect the search at all). If you have any evidence that you're getting different output, please let me know. It would be good if you can provide that model files so I can replicate the result On Tue, Feb 24, 2015 at 10:38 PM, Hieu Hoang hieuho...@gmail.com wrote: On 24/02/15 19:08, Erinç Dikici wrote: (Apparently the Gmane web interface turned my reply into garbled text, sorry for the double posting) Thanks again for your quick answers. Yes, 32 and 2 are the counts after sort | uniq | wc -l. The total number of hypotheses returned for both cases was 50. I removed the distincts from (my local copy of) scripts/training/mert-moses.pl (lines 1261 and 1263), and that solved the problem! Now I can get 32 unique hypotheses with v3.0, too. In fact, I am pretty sure I was able to get 50 unique hypotheses (out of a 50-best list) with the same configuration back in version 0.x. I hope the new -n-best-factor will do the trick. the decoding may have changed but the decoding algorithms should be exactly the same. The scores should be exactly the same (apart from rounding differences and OOV words, which shouldn't affect the search at all). If you have any evidence that you're getting different output, please let me know. It would be good if you can provide that model files so I can replicate the result Best, ED ___ Moses-support mailing listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate (until March 2015) ** searching for interesting commercial MT position ** University of Edinburghhttp://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Number of Unique Hypotheses in the N-best List
On 24/02/15 19:08, Erinç Dikici wrote: (Apparently the Gmane web interface turned my reply into garbled text, sorry for the double posting) Thanks again for your quick answers. Yes, 32 and 2 are the counts after sort | uniq | wc -l. The total number of hypotheses returned for both cases was 50. I removed the distincts from (my local copy of) scripts/training/mert-moses.pl http://mert-moses.pl (lines 1261 and 1263), and that solved the problem! Now I can get 32 unique hypotheses with v3.0, too. In fact, I am pretty sure I was able to get 50 unique hypotheses (out of a 50-best list) with the same configuration back in version 0.x. I hope the new -n-best-factor will do the trick. the decoding may have changed but the decoding algorithms should be exactly the same. The scores should be exactly the same (apart from rounding differences and OOV words, which shouldn't affect the search at all). If you have any evidence that you're getting different output, please let me know. It would be good if you can provide that model files so I can replicate the result Best, ED ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate (until March 2015) ** searching for interesting commercial MT position ** University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Number of Unique Hypotheses in the N-best List
somewhere between 2.1 and 3.0, the keyword 'distinct' was Oops, that was me. And it wasn't intended. I'm using this for my own setups and apparently copied it to master when I added some other stuff. Hope I didn't mess up other people's experiments. It's been in master since 7 August 2014 already and nobody noticed. Sorry for that, you can remove it again if you want. Lines 1280 and 1282 of scripts/training/mert-moses.pl . I'd assume that your 32 entries of the n-best list weren't actually unique, though, but a number of duplicates of the (two) very same outputs, as distinct should simply avoid duplicate entries. Here's a link to a related previous discussion on this mailing list: http://comments.gmane.org/gmane.comp.nlp.moses.user/11097 You can try the parameter n-best-factor. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Number of Unique Hypotheses in the N-best List
Matthias Huck mhuck@... writes: Hi Erinç, On Tue, 2015-02-24 at 16:24 +, Matthias Huck wrote: I'd assume that your 32 entries of the n-best list weren't actually unique, though, but a number of duplicates of the (two) very same outputs, as distinct should simply avoid duplicate entries. Actually, could you please check for us whether I'm right with this assumption? If I'm not, then some other modification since version 2.1 might affect your experiment. I hope that's not the case. Run something like cut -d'|' -f4 | sort | uniq | wc -l on the n-best list with 32 entries. It should print 2. Or did you do this already? (You're mentioning unique hypotheses in your mail.) Ck̰(��5��ѡ���(��((()Q́ȁ���ȁ�ե��ݕ�̸()e�̰��ȁȁ�ɔ�ѡչ�́��ѕȀ�ͽ�Ё��չ�ā����Q���ѽх���յ���)���ѡ�͕́ɕ��ɹ�ȁ��Ѡ���͕́݅̀���()$�ɕ��ٕ��ѡ��ѥ��Љ́�ɽ䁱䁽��)͍ɥ��̽�Ʌ�е��͕̹̀���ā���̤��ѡ�Ёͽ�ٕ��ѡ�)�ɽ��9�܁$���Ѐ�ȁչ��Ք�ѡ�͕́ݥѠ��̸���ѽ��()%�а�$�ɕ��ɔ�$�݅́�Ѽ���Ѐ���չ��Ք�ѡ�͕̀���Ё(��Ё���Ф�ݥѠ�ѡ��ͅ��Ʌѥ���ٕ�ͥ�ก$��ѡ�)��܀��е���ѽȁݥ��ѡ���ɥ���() ��а() ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support