Re: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry was found (0) in position 1

Nicholas Ruiz Tue, 03 Jul 2012 08:47:46 -0700

Hi Patricia,

Unfortunately, I'm not so well versed in SRILM, so I'm not sure I can answer 
the question about the blank line appearing in your ARPA file. You can also try 
training your model directly with IRSTLM (in text format) and you can see if 
the blank line also appears.


tlm -tr=<corpus> -lm=[wb|msb] -n=3 -o=complete_fr.truecased_unique_tok_irst.lm

(I'm not sure what you original params were for the SRI model)
wb=Witten-Bell Smoothing
msb=Modified Shift-Beta Smoothing

Best,
Nick

________________________________
From: Patricia Helmich [patriciahelm...@hotmail.com]
Sent: Tuesday, July 03, 2012 5:38 PM
To: Nicholas Ruiz
Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry 
was found (0) in position 1

Hi Nick,

ok, here are the first 10 lines of the BLM:

lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n 
complete_fr.truecased_unique_tok_clean.blm | head
     1  blmt 3 1091677 13524189 23061450
     2  1091677
     3
         0
     4  ! 0
     5  " 0
     6  # 0
     7  $ 0
     8  % 0
     9  & 0
    10  ' 0



It seems that the third line causes the problems because I deleted it in a copy 
of the BLM

lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n 
complete_fr.truecased_unique_tok_clean_copy.blm | head
     1  blmt 3 1091677 13524189 23061450
     2  1091677
     3  ! 0
     4  " 0
     5  # 0
     6  $ 0
     7  % 0
     8  & 0
     9  ' 0
    10  '00 0

and then I tried to compute the perplexity with the copy of the BLM and it 
worked well:

lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ 
/home/lingua/smt/irstlm/bin/compile-lm 
complete_fr.truecased_unique_tok_clean_copy.blm --eval 
/home/lingua/Patricia/Corpora/Corpora_Eval/devtest/nc-test2007.truecased.tok.fr
inpfile: complete_fr.truecased_unique_tok_clean_copy.blm
loading up to the LM level 1000 (if any)
dub: 10000000
Language Model Type of complete_fr.truecased_unique_tok_clean_copy.blm is 1
blmt
loadbin()
lmtable::loadbin_dict()
dict->size(): 1091677
loadbin_level (level 1)
loading 1091677 1-grams
done (level1)
loadbin_level (level 2)
loading 13524189 2-grams
done (level2)
loadbin_level (level 3)
loading 23061450 3-grams
done (level3)
done
OOV code is 218080
Start Eval
OOV code: 218080
%% Nw=58714 PP=1.03 PPwp=0.03 Nbo=58713 Noov=105 OOV=0.18%
lmtable class statistics
levels 3
lev 1 entries 1091677 used mem 15.62Mb
lev 2 entries 13524189 used mem 193.47Mb
lev 3 entries 23061450 used mem 153.95Mb
total allocated mem 363.03Mb
total number of get and binary search calls
level 1 get: 58714 bsearch: 0
level 2 get: 58713 bsearch: 117425
level 3 get: 58712 bsearch: 0


In the LM, I have also this empty line

lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n 
complete_fr.truecased_unique_tok_clean.lm | head
     1
     2  \data\
     3  ngram 1=1091677
     4  ngram 2=13524189
     5  ngram 3=23061450
     6
     7  \1-grams:
     8  -7.154682
                                -0.1456359
     9  -3.339167       !       -1.472732
    10  -2.43139        "       -0.733331

but in the phrase training or the perplexity computation with the LM, this does 
not cause any problems.

Also, I'm wondering why there is an entry for an empty line in the LM because I 
checked my french corpus and it does not contain any empty lines.


Best, Patricia








> From: nicr...@fbk.eu
> To: patriciahelm...@hotmail.com
> Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry 
> was found (0) in position 1
> Date: Tue, 3 Jul 2012 14:59:57 +0000
>
> Hi Patricia,
>
> Could you also send me the top 10 lines of your binarized LM?
>
> head complete_fr.truecased_unique_tok_clean.blm
>
> Thanks,
> Nick
>
> ________________________________
> From: Patricia Helmich [patriciahelm...@hotmail.com]
> Sent: Tuesday, July 03, 2012 4:40 PM
> To: Nicholas Ruiz; moses-support@mit.edu
> Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry 
> was found (0) in position 1
>
> Hi Nick,
>
> for
>
> /home/lingua/smt/irstlm/bin/compile-lm 
> complete_fr.truecased_unique_tok_clean.lm --eval 
> /home/lingua/Patricia/Corpora/Corpora_Eval/devtest/nc-test2007.truecased.tok.fr
>
> I get the following output:
>
> inpfile: complete_fr.truecased_unique_tok_clean.lm
> loading up to the LM level 1000 (if any)
> dub: 10000000
> Language Model Type of complete_fr.truecased_unique_tok_clean.lm is 1
> \data\
> loadtxt_ram()
> 1-grams: reading 1091677 entries
> done level1
> 2-grams: reading 13524189 entries
> ..done level2
> 3-grams: reading 23061450 entries
> ....done level3
> done
> OOV code is 218081
> OOV code is 218081
> Start Eval
> OOV code: 218081
> %% Nw=58714 PP=201.88 PPwp=5.70 Nbo=19233 Noov=105 OOV=0.18%
> lmtable class statistics
> levels 3
> lev 1 entries 1091677 used mem 15.62Mb
> lev 2 entries 13524189 used mem 193.47Mb
> lev 3 entries 23061450 used mem 153.95Mb
> total allocated mem 363.03Mb
> total number of get and binary search calls
> level 1 get: 3042 bsearch: 0
> level 2 get: 58713 bsearch: 23178875
> level 3 get: 58712 bsearch: 55672
>
>
>
> For
>
> /home/lingua/smt/irstlm/bin/compile-lm 
> complete_fr.truecased_unique_tok_clean.blm --eval 
> /home/lingua/Patricia/Corpora/Corpora_Eval/devtest/nc-test2007.truecased.tok.fr
>
> I get the same error as in the phrase training:
>
> inpfile: complete_fr.truecased_unique_tok_clean.blm
> loading up to the LM level 1000 (if any)
> dub: 10000000
> Language Model Type of complete_fr.truecased_unique_tok_clean.blm is 1
> blmt
> loadbin()
> lmtable::loadbin_dict()
> dictionary::loadtxt wrong entry was found (0) in position 1
>
>
>
> Best,
> Patricia
>
>
>
>
>
>
> > From: nicr...@fbk.eu
> > To: patriciahelm...@hotmail.com; moses-support@mit.edu
> > Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong 
> > entry was found (0) in position 1
> > Date: Tue, 3 Jul 2012 13:29:26 +0000
> >
> > Hi Patricia,
> >
> > Could you try computing the perplexity of your binarized LM with compile-lm?
> >
> > First on the ARPA format (SRILM):
> > /home/lingua/smt/irstlm/bin/compile-lm 
> > complete_fr.truecased_unique_tok_clean.lm --eval <text-to-eval>
> >
> > and then on the binarized version (before your symbolic link):
> > /home/lingua/smt/irstlm/bin/compile-lm 
> > complete_fr.truecased_unique_tok_clean.blm --eval <text-to-eval>
> >
> > It might be easier to debug by first looking at the direct output from 
> > IRSTLM.
> >
> > Thanks,
> > Nick
> >
> >
> > ________________________________
> > From: moses-support-boun...@mit.edu [moses-support-boun...@mit.edu] on 
> > behalf of Patricia Helmich [patriciahelm...@hotmail.com]
> > Sent: Tuesday, July 03, 2012 3:07 PM
> > To: moses-support@mit.edu
> > Subject: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry 
> > was found (0) in position 1
> >
> > Hi,
> > I am using Moses in combination with SRILM and IRSTLM for several language 
> > pairs.
> > After building LMs with SRILM and training the phrase model, I try to 
> > translate a sentence, for example:
> >
> > echo "this is a small house" | /home/lingua/smt/moses/bin/moses -f 
> > model/moses.ini
> >
> > This works well for each language pair.
> >
> > Then I produce an IRSTLM binary LM for each language pair, for example:
> >
> > /home/lingua/smt/irstlm/bin/compile-lm 
> > complete_fr.truecased_unique_tok_clean.lm 
> > complete_fr.truecased_unique_tok_clean.blm
> > ln -s complete_fr.truecased_unique_tok_clean.blm 
> > complete_fr.truecased_unique_tok_clean.blm.mm
> >
> > and I produce binary phrase tables and binary reordering tables:
> >
> > gzip -cd fr-en/f_en.e_fr/model/phrase-table.gz | LC_ALL=C sort | 
> > /home/lingua/smt/moses/bin/processPhraseTable -ttable 0 0 - -nscores 5 -out 
> > fr-en/f_en.e_fr/model/phrase-table
> > gzip -cd fr-en/f_en.e_fr/model/reordering-table.wbe-msd-bidirectional-fe.gz 
> > | LC_ALL=C sort | /home/lingua/smt/moses/bin/processLexicalTable -out 
> > fr-en/f_en.e_fr/model/reordering-table
> >
> > Then I create a copy of moses.ini (->moses-bin.ini) and set moses-bin.ini 
> > to use the binary files.
> >
> >
> > Now I try to translate a sentence with:
> >
> > echo "this is a small house" | TMP=/tmp /home/lingua/smt/moses/bin/moses -v 
> > 2 -f model/moses-bin.ini
> >
> >
> > This works well for each language pair, except for the language pair f: en, 
> > e: fr.
> >
> > The output is:
> >
> > Defined parameters (per moses.ini or switch):
> > config: model/moses-bin.ini
> > distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6 
> > /home/lingua/Patricia/Corpora/Corpora_Biling/fr-en/f_en.e_fr/model/reordering-table
> > distortion-limit: 6
> > input-factors: 0
> > lmodel-file: 1 0 3 
> > /home/lingua/Patricia/Corpora/Corpora_Monoling_Complete/fr/complete_fr.truecased_unique_tok_clean.blm.mm
> > mapping: 0 T 0
> > ttable-file: 1 0 0 5 
> > /home/lingua/Patricia/Corpora/Corpora_Biling/fr-en/f_en.e_fr/model/phrase-table
> > ttable-limit: 20
> > verbose: 2
> > weight-d: 0.3 0.3 0.3 0.3 0.3 0.3 0.3
> > weight-l: 0.5000
> > weight-t: 0.20 0.20 0.20 0.20 0.20
> > weight-w: -1
> > input type is: text input
> > Loading lexical distortion models...have 1 models
> > Creating lexical reordering...
> > weights: 0.300 0.300 0.300 0.300 0.300 0.300
> > binary file loaded, default OFF_T: -1
> > Start loading LanguageModel 
> > /home/lingua/Patricia/Corpora/Corpora_Monoling_Complete/fr/complete_fr.truecased_unique_tok_clean.blm.mm
> >  : [0.000] seconds
> > In LanguageModelIRST::Load: nGramOrder = 3
> > Language Model Type of 
> > /home/lingua/Patricia/Corpora/Corpora_Monoling_Complete/fr/complete_fr.truecased_unique_tok_clean.blm.mm
> >  is 1
> > blmt
> > loadbin()
> > lmtable::loadbin_dict()
> > dictionary::loadtxt wrong entry was found (0) in position 1
> >
> > I don't understand the reason for this error. Could you help me with this 
> > problem?
> >
> > Thank you,
> > Patricia
> >

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry was found (0) in position 1

Reply via email to