Hi Patricia, Unfortunately, I'm not so well versed in SRILM, so I'm not sure I can answer the question about the blank line appearing in your ARPA file. You can also try training your model directly with IRSTLM (in text format) and you can see if the blank line also appears.
tlm -tr=<corpus> -lm=[wb|msb] -n=3 -o=complete_fr.truecased_unique_tok_irst.lm (I'm not sure what you original params were for the SRI model) wb=Witten-Bell Smoothing msb=Modified Shift-Beta Smoothing Best, Nick ________________________________ From: Patricia Helmich [patriciahelm...@hotmail.com] Sent: Tuesday, July 03, 2012 5:38 PM To: Nicholas Ruiz Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry was found (0) in position 1 Hi Nick, ok, here are the first 10 lines of the BLM: lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n complete_fr.truecased_unique_tok_clean.blm | head 1 blmt 3 1091677 13524189 23061450 2 1091677 3 0 4 ! 0 5 " 0 6 # 0 7 $ 0 8 % 0 9 & 0 10 ' 0 It seems that the third line causes the problems because I deleted it in a copy of the BLM lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n complete_fr.truecased_unique_tok_clean_copy.blm | head 1 blmt 3 1091677 13524189 23061450 2 1091677 3 ! 0 4 " 0 5 # 0 6 $ 0 7 % 0 8 & 0 9 ' 0 10 '00 0 and then I tried to compute the perplexity with the copy of the BLM and it worked well: lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ /home/lingua/smt/irstlm/bin/compile-lm complete_fr.truecased_unique_tok_clean_copy.blm --eval /home/lingua/Patricia/Corpora/Corpora_Eval/devtest/nc-test2007.truecased.tok.fr inpfile: complete_fr.truecased_unique_tok_clean_copy.blm loading up to the LM level 1000 (if any) dub: 10000000 Language Model Type of complete_fr.truecased_unique_tok_clean_copy.blm is 1 blmt loadbin() lmtable::loadbin_dict() dict->size(): 1091677 loadbin_level (level 1) loading 1091677 1-grams done (level1) loadbin_level (level 2) loading 13524189 2-grams done (level2) loadbin_level (level 3) loading 23061450 3-grams done (level3) done OOV code is 218080 Start Eval OOV code: 218080 %% Nw=58714 PP=1.03 PPwp=0.03 Nbo=58713 Noov=105 OOV=0.18% lmtable class statistics levels 3 lev 1 entries 1091677 used mem 15.62Mb lev 2 entries 13524189 used mem 193.47Mb lev 3 entries 23061450 used mem 153.95Mb total allocated mem 363.03Mb total number of get and binary search calls level 1 get: 58714 bsearch: 0 level 2 get: 58713 bsearch: 117425 level 3 get: 58712 bsearch: 0 In the LM, I have also this empty line lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n complete_fr.truecased_unique_tok_clean.lm | head 1 2 \data\ 3 ngram 1=1091677 4 ngram 2=13524189 5 ngram 3=23061450 6 7 \1-grams: 8 -7.154682 -0.1456359 9 -3.339167 ! -1.472732 10 -2.43139 " -0.733331 but in the phrase training or the perplexity computation with the LM, this does not cause any problems. Also, I'm wondering why there is an entry for an empty line in the LM because I checked my french corpus and it does not contain any empty lines. Best, Patricia > From: nicr...@fbk.eu > To: patriciahelm...@hotmail.com > Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry > was found (0) in position 1 > Date: Tue, 3 Jul 2012 14:59:57 +0000 > > Hi Patricia, > > Could you also send me the top 10 lines of your binarized LM? > > head complete_fr.truecased_unique_tok_clean.blm > > Thanks, > Nick > > ________________________________ > From: Patricia Helmich [patriciahelm...@hotmail.com] > Sent: Tuesday, July 03, 2012 4:40 PM > To: Nicholas Ruiz; moses-support@mit.edu > Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry > was found (0) in position 1 > > Hi Nick, > > for > > /home/lingua/smt/irstlm/bin/compile-lm > complete_fr.truecased_unique_tok_clean.lm --eval > /home/lingua/Patricia/Corpora/Corpora_Eval/devtest/nc-test2007.truecased.tok.fr > > I get the following output: > > inpfile: complete_fr.truecased_unique_tok_clean.lm > loading up to the LM level 1000 (if any) > dub: 10000000 > Language Model Type of complete_fr.truecased_unique_tok_clean.lm is 1 > \data\ > loadtxt_ram() > 1-grams: reading 1091677 entries > done level1 > 2-grams: reading 13524189 entries > ..done level2 > 3-grams: reading 23061450 entries > ....done level3 > done > OOV code is 218081 > OOV code is 218081 > Start Eval > OOV code: 218081 > %% Nw=58714 PP=201.88 PPwp=5.70 Nbo=19233 Noov=105 OOV=0.18% > lmtable class statistics > levels 3 > lev 1 entries 1091677 used mem 15.62Mb > lev 2 entries 13524189 used mem 193.47Mb > lev 3 entries 23061450 used mem 153.95Mb > total allocated mem 363.03Mb > total number of get and binary search calls > level 1 get: 3042 bsearch: 0 > level 2 get: 58713 bsearch: 23178875 > level 3 get: 58712 bsearch: 55672 > > > > For > > /home/lingua/smt/irstlm/bin/compile-lm > complete_fr.truecased_unique_tok_clean.blm --eval > /home/lingua/Patricia/Corpora/Corpora_Eval/devtest/nc-test2007.truecased.tok.fr > > I get the same error as in the phrase training: > > inpfile: complete_fr.truecased_unique_tok_clean.blm > loading up to the LM level 1000 (if any) > dub: 10000000 > Language Model Type of complete_fr.truecased_unique_tok_clean.blm is 1 > blmt > loadbin() > lmtable::loadbin_dict() > dictionary::loadtxt wrong entry was found (0) in position 1 > > > > Best, > Patricia > > > > > > > > From: nicr...@fbk.eu > > To: patriciahelm...@hotmail.com; moses-support@mit.edu > > Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong > > entry was found (0) in position 1 > > Date: Tue, 3 Jul 2012 13:29:26 +0000 > > > > Hi Patricia, > > > > Could you try computing the perplexity of your binarized LM with compile-lm? > > > > First on the ARPA format (SRILM): > > /home/lingua/smt/irstlm/bin/compile-lm > > complete_fr.truecased_unique_tok_clean.lm --eval <text-to-eval> > > > > and then on the binarized version (before your symbolic link): > > /home/lingua/smt/irstlm/bin/compile-lm > > complete_fr.truecased_unique_tok_clean.blm --eval <text-to-eval> > > > > It might be easier to debug by first looking at the direct output from > > IRSTLM. > > > > Thanks, > > Nick > > > > > > ________________________________ > > From: moses-support-boun...@mit.edu [moses-support-boun...@mit.edu] on > > behalf of Patricia Helmich [patriciahelm...@hotmail.com] > > Sent: Tuesday, July 03, 2012 3:07 PM > > To: moses-support@mit.edu > > Subject: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry > > was found (0) in position 1 > > > > Hi, > > I am using Moses in combination with SRILM and IRSTLM for several language > > pairs. > > After building LMs with SRILM and training the phrase model, I try to > > translate a sentence, for example: > > > > echo "this is a small house" | /home/lingua/smt/moses/bin/moses -f > > model/moses.ini > > > > This works well for each language pair. > > > > Then I produce an IRSTLM binary LM for each language pair, for example: > > > > /home/lingua/smt/irstlm/bin/compile-lm > > complete_fr.truecased_unique_tok_clean.lm > > complete_fr.truecased_unique_tok_clean.blm > > ln -s complete_fr.truecased_unique_tok_clean.blm > > complete_fr.truecased_unique_tok_clean.blm.mm > > > > and I produce binary phrase tables and binary reordering tables: > > > > gzip -cd fr-en/f_en.e_fr/model/phrase-table.gz | LC_ALL=C sort | > > /home/lingua/smt/moses/bin/processPhraseTable -ttable 0 0 - -nscores 5 -out > > fr-en/f_en.e_fr/model/phrase-table > > gzip -cd fr-en/f_en.e_fr/model/reordering-table.wbe-msd-bidirectional-fe.gz > > | LC_ALL=C sort | /home/lingua/smt/moses/bin/processLexicalTable -out > > fr-en/f_en.e_fr/model/reordering-table > > > > Then I create a copy of moses.ini (->moses-bin.ini) and set moses-bin.ini > > to use the binary files. > > > > > > Now I try to translate a sentence with: > > > > echo "this is a small house" | TMP=/tmp /home/lingua/smt/moses/bin/moses -v > > 2 -f model/moses-bin.ini > > > > > > This works well for each language pair, except for the language pair f: en, > > e: fr. > > > > The output is: > > > > Defined parameters (per moses.ini or switch): > > config: model/moses-bin.ini > > distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6 > > /home/lingua/Patricia/Corpora/Corpora_Biling/fr-en/f_en.e_fr/model/reordering-table > > distortion-limit: 6 > > input-factors: 0 > > lmodel-file: 1 0 3 > > /home/lingua/Patricia/Corpora/Corpora_Monoling_Complete/fr/complete_fr.truecased_unique_tok_clean.blm.mm > > mapping: 0 T 0 > > ttable-file: 1 0 0 5 > > /home/lingua/Patricia/Corpora/Corpora_Biling/fr-en/f_en.e_fr/model/phrase-table > > ttable-limit: 20 > > verbose: 2 > > weight-d: 0.3 0.3 0.3 0.3 0.3 0.3 0.3 > > weight-l: 0.5000 > > weight-t: 0.20 0.20 0.20 0.20 0.20 > > weight-w: -1 > > input type is: text input > > Loading lexical distortion models...have 1 models > > Creating lexical reordering... > > weights: 0.300 0.300 0.300 0.300 0.300 0.300 > > binary file loaded, default OFF_T: -1 > > Start loading LanguageModel > > /home/lingua/Patricia/Corpora/Corpora_Monoling_Complete/fr/complete_fr.truecased_unique_tok_clean.blm.mm > > : [0.000] seconds > > In LanguageModelIRST::Load: nGramOrder = 3 > > Language Model Type of > > /home/lingua/Patricia/Corpora/Corpora_Monoling_Complete/fr/complete_fr.truecased_unique_tok_clean.blm.mm > > is 1 > > blmt > > loadbin() > > lmtable::loadbin_dict() > > dictionary::loadtxt wrong entry was found (0) in position 1 > > > > I don't understand the reason for this error. Could you help me with this > > problem? > > > > Thank you, > > Patricia > > _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support