Re: [Moses-support] IRSTLM: Trash sentences getting more probability scores than proper grammatical sentences

2016-03-23 Thread Kenneth Heafield
kangaroo is less probable than snake.  Which more than explains the
difference you observed.  Film at 11.

That p() is pretty high.  What happened when you used lmplz to
build the model?

Kenneth

On 03/23/2016 09:28 AM, Bhat Irshad wrote:
> I build a language model using IRSTLM on 20 million tokenized English
> sentences and tested on the following two sentences:
> 
> 1. Yesterday when I was walking towards home , I saw a kangaroo .
> 2. smdnbs sadb jghsa sdabasd asasd tsados hasdb , I saw a snake .
> 
> As we can the first portion of second sentence is completely trash while
> first sentence is a proper grammatical one. I was surprised to see that
> second sentence got higher probability score (-27.887135) than first one
> (-28.91925).
> 
> I guess this happened due to back-off, I am not sure though. 
> 
> echo 'Yesterday when I was walking towards home , I saw a kangaroo .' |
> /usr/bin/query english-lcc-ilci-ukwac-tok-20M-n3.blm 2> /tmp/a
> Yesterday=126222 2 -4.08843when=409 3 -2.51627I=260 3 -0.58336was=771 3
> -0.764257walking=1624 3 -2.58353towards=1335 3 -1.95033home=388 2
> -3.910977,=209 3 -1.15596I=260 3 -1.55485saw=4411 3 -2.31963a=131 3
> -0.886832kangaroo=106652 2 -5.3615108.=10 3 -1.24128=11 3
> -0.00203508Total: -28.91925 OOV: 0
> Perplexity including OOVs:116.32170228822577
> Perplexity excluding OOVs:116.32170228822577
> OOVs:0
> Tokens:14
> 
> echo 'smdnbs sadb jghsa sdabasd asasd tsados hasdb , I saw a snake .' |
> /usr/bin/query english-lcc-ilci-ukwac-tok-20M-n3.blm 2> /tmp/a
> smdnbs=0 1 -4.0025997sadb=0 1 -2.23153jghsa=0 1 -2.23153sdabasd=0 1
> -2.23153asasd=0 1 -2.23153tsados=0 1 -2.23153hasdb=0 1 -2.23153,=209 1
> -1.42496I=260 2 -1.9045saw=4411 3 -2.31963a=131 3 -0.886832snake=3768 3
> -3.16116.=10 3 -0.793541=11 3 -0.0047327Total: -27.887135 OOV: 7
> Perplexity including OOVs:98.16082104257269
> Perplexity excluding OOVs:31.57449745907425
> OOVs:7
> Tokens:14
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] IRSTLM

2016-01-19 Thread Matthias Huck
Hi,

I believe that the "~" might be the culprit. Try:

./bjam 
--with-irstlm=/home/mty2015/Public/MTEngine/Moseshome/mosesdecoder/irstlm

(If this is the correct absolute path to your IRSTLM installation.)

Cheers,
Matthias


On Wed, 2016-01-20 at 00:32 +, Hieu Hoang wrote:
> it's likely there was an error when you compiled irstlm as the irstlm
> library cannot be found.
> 
> can i ask - why do you need IRSTLM? for most cases, KenLM is faster.
> It's built into Moses so there's no external libraries you have to
> compile
> 
> On 20/01/16 00:27, Ouafa Benterki wrote:
> > Hi ,
> > 
> > Please find enclosed attached the build log, here's the command i
> > run
> > ./bjam--with-irstlm=~/Public/MTEngine/Moseshome/mosesdecoder/irstlm
> > 
> > best
> > 
> > Ouafa


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] IRSTLM

2016-01-19 Thread Hieu Hoang
it's likely there was an error when you compiled irstlm as the irstlm 
library cannot be found.


can i ask - why do you need IRSTLM? for most cases, KenLM is faster. 
It's built into Moses so there's no external libraries you have to compile


On 20/01/16 00:27, Ouafa Benterki wrote:

Hi ,

Please find enclosed attached the build log, here's the command i run
./bjam--with-irstlm=~/Public/MTEngine/Moseshome/mosesdecoder/irstlm

best

Ouafa

On Tue, Jan 19, 2016 at 7:59 PM, > wrote:


Send Moses-support mailing list submissions to
moses-support@mit.edu 

To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
moses-support-requ...@mit.edu 

You can reach the person managing the list at
moses-support-ow...@mit.edu 

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

   1. Building Moses on El Capitan? (Jake Ballinger)
   2. Re: Building Moses on El Capitan? (Hieu Hoang)


--

Message: 1
Date: Tue, 19 Jan 2016 13:55:30 -0500
From: Jake Ballinger >
Subject: [Moses-support] Building Moses on El Capitan?
To: moses-support@mit.edu 
Message-ID:
   
>

Content-Type: text/plain; charset="utf-8"

Hello,

I'm trying to build Moses on OS X El Capitan and I can't seem to
get it
right. I've attached my build log, and the last command I used
was: ./bjam
toolset=clang

--with-boost=~/Users/ballingerj/cscomp-ballingerj/mosesdecoder/build/boost/boost_1_59_0/tools/build/.

I think the problem has to with the Darwin on El Capitan, but I
cannot say
so for sure.

Thank you!

--
Jake Ballinger
Major: Computer Science
Minors: Chinese, French, Spanish, & Math
443-974-6184 
balling...@allegheny.edu 
Box 582
-- next part --
An HTML attachment was scrubbed...
URL:

http://mailman.mit.edu/mailman/private/moses-support/attachments/20160119/c95c56e6/attachment-0001.html
-- next part --
A non-text attachment was scrubbed...
Name: build.log.gz
Type: application/x-gzip
Size: 11818 bytes
Desc: not available
Url :

http://mailman.mit.edu/mailman/private/moses-support/attachments/20160119/c95c56e6/attachment-0001.gz

--

Message: 2
Date: Tue, 19 Jan 2016 18:59:39 +
From: Hieu Hoang >
Subject: Re: [Moses-support] Building Moses on El Capitan?
To: Jake Ballinger >
Cc: moses-support >
Message-ID:
   

Re: [Moses-support] IRSTLM installation

2016-01-18 Thread Matthias Huck
Hi,

Have you tried to use an absolute path?

Cheers,
Matthias


On Mon, 2016-01-18 at 02:52 +0100, Ouafa Benterki wrote:
> Hello,
> 
> I installed IRSTLM but when i used the command
> ./bjam --with-irstlm=/path to irstlm/ the installation failed
> can you advise
> 
> Best



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] IRSTLM ERROR

2012-11-12 Thread Nick Ruiz

Hi Clement,

The installation instructions for IRSTLM can be found here:
http://sourceforge.net/apps/mediawiki/irstlm/index.php?title=Installation_Guidelines
also, here is the instruction manual:
http://sourceforge.net/apps/mediawiki/irstlm/index.php?title=User_Manual

First, try calling the following:
bash regenerate-makefiles.sh
(depending on the system, you may need to specify which shell 
interpreter is needed)


If all else fails, do you have GNU autotools installed?

Best,
Nick

--
Nick Ruiz
Fondazione Bruno Kessler
FBK-Irst - Human Language Technology Unit
38050 Povo (TN) Italy

On 11/12/2012 06:04 PM, OYELEKE ODOJE wrote:

Hi Barry,

While installing IRSTLM file version irstlm-5.70.04.
the command line ./regenerate-makefiles.sh generated the error below

mrodoje@ubuntu:~/irstlm-5.70.04$ ./regenerate-makefiles.sh
Calling
Calling ...
./regenerate-makefiles.sh: line 52: -I: command not found aclocal failed

what do you think could be done to configure the irstlm for baseline 
system?


 Mosesdecoder and Giza-pp,  have been installed.

Clement Odoje
Department of Linguistics and African Languages

University of Ibadan,

Ibadan, Nigeria

+2348032387999
What you do today becomes history tomorrow, what will you be 
remembered for?



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] [IRSTLM] Segmentation fault when loading a model

2012-08-06 Thread Nicola Bertoldi
Dear Filip.

I am happy to help you, but I get too few info from your information

By the way, I would like to  move this thread to the IRSTLM mailing list
or even in a private thread.

How do you use the dictionary in your program?
Are you using in in cremental mode or not?

Could you please send me the piece of your code related to dictionary,
as well as the log of your debugging?

best regards,
Nicola Bertoldi
(IRSTLM development team)

On Aug 6, 2012, at 3:52 PM, Filip Petkovski wrote:

Hi,

I am using IRSTLM for making a language model and I got a segmentation fault
when I was trying to load a binary model trained using build-lm.sh and compiled 
using compile-lm.

I tracked the problem down to the dictionary::load(std::istream) method in the 
trunk/src/dictionary.cpp file.

As far as I could tell, there is an issue with initialization of a dictionary 
object and its member fields,
since the segmentation fault occurred when trying to access a member field of 
strstack in strstack::push(const char *)

I compiled my program with g++ -Wall -I$IRSTLM/include program.cpp -o program 
-L$IRSTLM/lib -lirstlm -lz

Best Regards,
Filip Petkovski
___
Moses-support mailing list
Moses-support@mit.edumailto:Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry was found (0) in position 1

2012-07-04 Thread Patricia Helmich

Hi,there was indeed a vertical tab in the corpus.
Thanks to both of you!Patricia



 From: bhad...@staffmail.ed.ac.uk
 To: moses-support@mit.edu; patriciahelm...@hotmail.com
 Subject: Re: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry 
 was found (0) in position 1
 Date: Tue, 3 Jul 2012 16:57:39 +0100
 
 Hi Patricia
 
 It looks like you have some odd characters in your corpus - perhaps vertical 
 tabs. You could use xxd on the lm file to try to figure out what it is,
 
 cheers - Barry
 
 On Tuesday 03 July 2012 16:46:35 Nicholas Ruiz wrote:
  Hi Patricia,
  
  Unfortunately, I'm not so well versed in SRILM, so I'm not sure I can
   answer the question about the blank line appearing in your ARPA file. You
   can also try training your model directly with IRSTLM (in text format) and
   you can see if the blank line also appears.
  
  tlm -tr=corpus -lm=[wb|msb] -n=3
   -o=complete_fr.truecased_unique_tok_irst.lm
  
  (I'm not sure what you original params were for the SRI model)
  wb=Witten-Bell Smoothing
  msb=Modified Shift-Beta Smoothing
  
  Best,
  Nick
  
  
  From: Patricia Helmich [patriciahelm...@hotmail.com]
  Sent: Tuesday, July 03, 2012 5:38 PM
  To: Nicholas Ruiz
  Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong
   entry was found (0) in position 1
  
  Hi Nick,
  
  ok, here are the first 10 lines of the BLM:
  
  lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n
   complete_fr.truecased_unique_tok_clean.blm | head 1  blmt 3 1091677
   13524189 23061450
   2  1091677
   3
   0
   4  ! 0
   5   0
   6  # 0
   7  $ 0
   8  % 0
   9   0
  10  ' 0
  
  
  
  It seems that the third line causes the problems because I deleted it in a
   copy of the BLM
  
  lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n
   complete_fr.truecased_unique_tok_clean_copy.blm | head 1  blmt 3 1091677
   13524189 23061450
   2  1091677
   3  ! 0
   4   0
   5  # 0
   6  $ 0
   7  % 0
   8   0
   9  ' 0
  10  '00 0
  
  and then I tried to compute the perplexity with the copy of the BLM and it
   worked well:
  
  lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$
   /home/lingua/smt/irstlm/bin/compile-lm
   complete_fr.truecased_unique_tok_clean_copy.blm --eval
   /home/lingua/Patricia/Corpora/Corpora_Eval/devtest/nc-test2007.truecased.t
  ok.fr inpfile: complete_fr.truecased_unique_tok_clean_copy.blm
  loading up to the LM level 1000 (if any)
  dub: 1000
  Language Model Type of complete_fr.truecased_unique_tok_clean_copy.blm is 1
  blmt
  loadbin()
  lmtable::loadbin_dict()
  dict-size(): 1091677
  loadbin_level (level 1)
  loading 1091677 1-grams
  done (level1)
  loadbin_level (level 2)
  loading 13524189 2-grams
  done (level2)
  loadbin_level (level 3)
  loading 23061450 3-grams
  done (level3)
  done
  OOV code is 218080
  Start Eval
  OOV code: 218080
  %% Nw=58714 PP=1.03 PPwp=0.03 Nbo=58713 Noov=105 OOV=0.18%
  lmtable class statistics
  levels 3
  lev 1 entries 1091677 used mem 15.62Mb
  lev 2 entries 13524189 used mem 193.47Mb
  lev 3 entries 23061450 used mem 153.95Mb
  total allocated mem 363.03Mb
  total number of get and binary search calls
  level 1 get: 58714 bsearch: 0
  level 2 get: 58713 bsearch: 117425
  level 3 get: 58712 bsearch: 0
  
  
  In the LM, I have also this empty line
  
  lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n
   complete_fr.truecased_unique_tok_clean.lm | head 1
   2  \data\
   3  ngram 1=1091677
   4  ngram 2=13524189
   5  ngram 3=23061450
   6
   7  \1-grams:
   8  -7.154682
  -0.1456359
   9  -3.339167   !   -1.472732
  10  -2.43139   -0.71
  
  but in the phrase training or the perplexity computation with the LM, this
   does not cause any problems.
  
  Also, I'm wondering why there is an entry for an empty line in the LM
   because I checked my french corpus and it does not contain any empty
   lines.
  
  
  Best, Patricia
  
   From: nicr...@fbk.eu
   To: patriciahelm...@hotmail.com
   Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong
   entry was found (0) in position 1 Date: Tue, 3 Jul 2012 14:59:57 +
  
   Hi Patricia,
  
   Could you also send me the top 10 lines of your binarized LM?
  
   head complete_fr.truecased_unique_tok_clean.blm
  
   Thanks,
   Nick
  
   
   From: Patricia Helmich [patriciahelm...@hotmail.com]
   Sent: Tuesday, July 03, 2012 4:40 PM
   To: Nicholas Ruiz; moses-support@mit.edu
   Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong
   entry was found (0) in position 1
  
   Hi Nick,
  
   for
  
   /home/lingua/smt/irstlm/bin/compile-lm
   complete_fr.truecased_unique_tok_clean.lm --eval
   /home/lingua/Patricia/Corpora/Corpora_Eval

Re: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry was found (0) in position 1

2012-07-03 Thread Nicholas Ruiz
Hi Patricia,

Could you try computing the perplexity of your binarized LM with compile-lm?

First on the ARPA format (SRILM):
/home/lingua/smt/irstlm/bin/compile-lm 
complete_fr.truecased_unique_tok_clean.lm --eval text-to-eval

and then on the binarized version (before your symbolic link):
/home/lingua/smt/irstlm/bin/compile-lm 
complete_fr.truecased_unique_tok_clean.blm --eval text-to-eval

It might be easier to debug by first looking at the direct output from IRSTLM.

Thanks,
Nick



From: moses-support-boun...@mit.edu [moses-support-boun...@mit.edu] on behalf 
of Patricia Helmich [patriciahelm...@hotmail.com]
Sent: Tuesday, July 03, 2012 3:07 PM
To: moses-support@mit.edu
Subject: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry was 
found (0) in position 1

Hi,
I am using Moses in combination with SRILM and IRSTLM for several language 
pairs.
After building LMs with SRILM and training the phrase model, I try to translate 
a sentence, for example:

 echo this is a small house | /home/lingua/smt/moses/bin/moses -f 
model/moses.ini

This works well for each language pair.

Then I produce an IRSTLM binary LM for each language pair, for example:

/home/lingua/smt/irstlm/bin/compile-lm 
complete_fr.truecased_unique_tok_clean.lm 
complete_fr.truecased_unique_tok_clean.blm
ln -s complete_fr.truecased_unique_tok_clean.blm 
complete_fr.truecased_unique_tok_clean.blm.mm

and I produce binary phrase tables and binary reordering tables:

gzip -cd fr-en/f_en.e_fr/model/phrase-table.gz | LC_ALL=C sort | 
/home/lingua/smt/moses/bin/processPhraseTable -ttable 0 0 - -nscores 5 -out  
fr-en/f_en.e_fr/model/phrase-table
gzip -cd fr-en/f_en.e_fr/model/reordering-table.wbe-msd-bidirectional-fe.gz | 
LC_ALL=C sort | /home/lingua/smt/moses/bin/processLexicalTable -out 
fr-en/f_en.e_fr/model/reordering-table

Then I create a copy of moses.ini (-moses-bin.ini) and set moses-bin.ini to 
use the binary files.


Now I try to translate a sentence with:

 echo this is a small house | TMP=/tmp /home/lingua/smt/moses/bin/moses -v 2 
-f model/moses-bin.ini


This works well for each language pair, except for the language pair f: en, e: 
fr.

The output is:

Defined parameters (per moses.ini or switch):
 config: model/moses-bin.ini
 distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6 
/home/lingua/Patricia/Corpora/Corpora_Biling/fr-en/f_en.e_fr/model/reordering-table
 distortion-limit: 6
 input-factors: 0
 lmodel-file: 1 0 3 
/home/lingua/Patricia/Corpora/Corpora_Monoling_Complete/fr/complete_fr.truecased_unique_tok_clean.blm.mm
 mapping: 0 T 0
 ttable-file: 1 0 0 5 
/home/lingua/Patricia/Corpora/Corpora_Biling/fr-en/f_en.e_fr/model/phrase-table
 ttable-limit: 20
 verbose: 2
 weight-d: 0.3 0.3 0.3 0.3 0.3 0.3 0.3
 weight-l: 0.5000
 weight-t: 0.20 0.20 0.20 0.20 0.20
 weight-w: -1
input type is: text input
Loading lexical distortion models...have 1 models
Creating lexical reordering...
weights: 0.300 0.300 0.300 0.300 0.300 0.300
binary file loaded, default OFF_T: -1
Start loading LanguageModel 
/home/lingua/Patricia/Corpora/Corpora_Monoling_Complete/fr/complete_fr.truecased_unique_tok_clean.blm.mm
 : [0.000] seconds
In LanguageModelIRST::Load: nGramOrder = 3
Language Model Type of 
/home/lingua/Patricia/Corpora/Corpora_Monoling_Complete/fr/complete_fr.truecased_unique_tok_clean.blm.mm
 is 1
blmt
loadbin()
lmtable::loadbin_dict()
dictionary::loadtxt wrong entry was found (0) in position 1

I don't understand the reason for this error. Could you help me with this 
problem?

Thank you,
Patricia


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry was found (0) in position 1

2012-07-03 Thread Patricia Helmich

Hi Nick,
for
/home/lingua/smt/irstlm/bin/compile-lm 
complete_fr.truecased_unique_tok_clean.lm --eval 
/home/lingua/Patricia/Corpora/Corpora_Eval/devtest/nc-test2007.truecased.tok.fr
I get the following output:
inpfile: complete_fr.truecased_unique_tok_clean.lmloading up to the LM level 
1000 (if any)dub: 1000Language Model Type of 
complete_fr.truecased_unique_tok_clean.lm is 1\data\loadtxt_ram()1-grams: 
reading 1091677 entriesdone level12-grams: reading 13524189 entries..done 
level23-grams: reading 23061450 entriesdone level3doneOOV code is 218081OOV 
code is 218081Start EvalOOV code: 218081%% Nw=58714 PP=201.88 PPwp=5.70 
Nbo=19233 Noov=105 OOV=0.18%lmtable class statisticslevels 3lev 1 entries 
1091677 used mem 15.62Mblev 2 entries 13524189 used mem 193.47Mblev 3 entries 
23061450 used mem 153.95Mbtotal allocated mem 363.03Mbtotal number of get and 
binary search callslevel 1 get: 3042 bsearch: 0level 2 get: 58713 bsearch: 
23178875level 3 get: 58712 bsearch: 55672


For
/home/lingua/smt/irstlm/bin/compile-lm 
complete_fr.truecased_unique_tok_clean.blm --eval 
/home/lingua/Patricia/Corpora/Corpora_Eval/devtest/nc-test2007.truecased.tok.fr
I get the same error as in the phrase training:
inpfile: complete_fr.truecased_unique_tok_clean.blmloading up to the LM level 
1000 (if any)dub: 1000Language Model Type of 
complete_fr.truecased_unique_tok_clean.blm is 
1blmtloadbin()lmtable::loadbin_dict()dictionary::loadtxt wrong entry was found 
(0) in position 1


Best,Patricia





 From: nicr...@fbk.eu
 To: patriciahelm...@hotmail.com; moses-support@mit.edu
 Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry 
 was found (0) in position 1
 Date: Tue, 3 Jul 2012 13:29:26 +
 
 Hi Patricia,
 
 Could you try computing the perplexity of your binarized LM with compile-lm?
 
 First on the ARPA format (SRILM):
 /home/lingua/smt/irstlm/bin/compile-lm 
 complete_fr.truecased_unique_tok_clean.lm --eval text-to-eval
 
 and then on the binarized version (before your symbolic link):
 /home/lingua/smt/irstlm/bin/compile-lm 
 complete_fr.truecased_unique_tok_clean.blm --eval text-to-eval
 
 It might be easier to debug by first looking at the direct output from IRSTLM.
 
 Thanks,
 Nick
 
 
 
 From: moses-support-boun...@mit.edu [moses-support-boun...@mit.edu] on behalf 
 of Patricia Helmich [patriciahelm...@hotmail.com]
 Sent: Tuesday, July 03, 2012 3:07 PM
 To: moses-support@mit.edu
 Subject: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry was 
 found (0) in position 1
 
 Hi,
 I am using Moses in combination with SRILM and IRSTLM for several language 
 pairs.
 After building LMs with SRILM and training the phrase model, I try to 
 translate a sentence, for example:
 
  echo this is a small house | /home/lingua/smt/moses/bin/moses -f 
 model/moses.ini
 
 This works well for each language pair.
 
 Then I produce an IRSTLM binary LM for each language pair, for example:
 
 /home/lingua/smt/irstlm/bin/compile-lm 
 complete_fr.truecased_unique_tok_clean.lm 
 complete_fr.truecased_unique_tok_clean.blm
 ln -s complete_fr.truecased_unique_tok_clean.blm 
 complete_fr.truecased_unique_tok_clean.blm.mm
 
 and I produce binary phrase tables and binary reordering tables:
 
 gzip -cd fr-en/f_en.e_fr/model/phrase-table.gz | LC_ALL=C sort | 
 /home/lingua/smt/moses/bin/processPhraseTable -ttable 0 0 - -nscores 5 -out  
 fr-en/f_en.e_fr/model/phrase-table
 gzip -cd fr-en/f_en.e_fr/model/reordering-table.wbe-msd-bidirectional-fe.gz | 
 LC_ALL=C sort | /home/lingua/smt/moses/bin/processLexicalTable -out 
 fr-en/f_en.e_fr/model/reordering-table
 
 Then I create a copy of moses.ini (-moses-bin.ini) and set moses-bin.ini to 
 use the binary files.
 
 
 Now I try to translate a sentence with:
 
  echo this is a small house | TMP=/tmp /home/lingua/smt/moses/bin/moses -v 
 2 -f model/moses-bin.ini
 
 
 This works well for each language pair, except for the language pair f: en, 
 e: fr.
 
 The output is:
 
 Defined parameters (per moses.ini or switch):
  config: model/moses-bin.ini
  distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6 
 /home/lingua/Patricia/Corpora/Corpora_Biling/fr-en/f_en.e_fr/model/reordering-table
  distortion-limit: 6
  input-factors: 0
  lmodel-file: 1 0 3 
 /home/lingua/Patricia/Corpora/Corpora_Monoling_Complete/fr/complete_fr.truecased_unique_tok_clean.blm.mm
  mapping: 0 T 0
  ttable-file: 1 0 0 5 
 /home/lingua/Patricia/Corpora/Corpora_Biling/fr-en/f_en.e_fr/model/phrase-table
  ttable-limit: 20
  verbose: 2
  weight-d: 0.3 0.3 0.3 0.3 0.3 0.3 0.3
  weight-l: 0.5000
  weight-t: 0.20 0.20 0.20 0.20 0.20
  weight-w: -1
 input type is: text input
 Loading lexical distortion models...have 1 models
 Creating lexical reordering...
 weights: 0.300 0.300 0.300 0.300 0.300 0.300
 binary file loaded, default OFF_T: -1
 Start loading LanguageModel 
 /home/lingua/Patricia/Corpora/Corpora_Monoling_Complete/fr

Re: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry was found (0) in position 1

2012-07-03 Thread Nicholas Ruiz
Hi Patricia,

Unfortunately, I'm not so well versed in SRILM, so I'm not sure I can answer 
the question about the blank line appearing in your ARPA file. You can also try 
training your model directly with IRSTLM (in text format) and you can see if 
the blank line also appears.

tlm -tr=corpus -lm=[wb|msb] -n=3 -o=complete_fr.truecased_unique_tok_irst.lm

(I'm not sure what you original params were for the SRI model)
wb=Witten-Bell Smoothing
msb=Modified Shift-Beta Smoothing

Best,
Nick


From: Patricia Helmich [patriciahelm...@hotmail.com]
Sent: Tuesday, July 03, 2012 5:38 PM
To: Nicholas Ruiz
Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry 
was found (0) in position 1

Hi Nick,

ok, here are the first 10 lines of the BLM:

lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n 
complete_fr.truecased_unique_tok_clean.blm | head
 1  blmt 3 1091677 13524189 23061450
 2  1091677
 3
 0
 4  ! 0
 5   0
 6  # 0
 7  $ 0
 8  % 0
 9   0
10  ' 0



It seems that the third line causes the problems because I deleted it in a copy 
of the BLM

lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n 
complete_fr.truecased_unique_tok_clean_copy.blm | head
 1  blmt 3 1091677 13524189 23061450
 2  1091677
 3  ! 0
 4   0
 5  # 0
 6  $ 0
 7  % 0
 8   0
 9  ' 0
10  '00 0

and then I tried to compute the perplexity with the copy of the BLM and it 
worked well:

lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ 
/home/lingua/smt/irstlm/bin/compile-lm 
complete_fr.truecased_unique_tok_clean_copy.blm --eval 
/home/lingua/Patricia/Corpora/Corpora_Eval/devtest/nc-test2007.truecased.tok.fr
inpfile: complete_fr.truecased_unique_tok_clean_copy.blm
loading up to the LM level 1000 (if any)
dub: 1000
Language Model Type of complete_fr.truecased_unique_tok_clean_copy.blm is 1
blmt
loadbin()
lmtable::loadbin_dict()
dict-size(): 1091677
loadbin_level (level 1)
loading 1091677 1-grams
done (level1)
loadbin_level (level 2)
loading 13524189 2-grams
done (level2)
loadbin_level (level 3)
loading 23061450 3-grams
done (level3)
done
OOV code is 218080
Start Eval
OOV code: 218080
%% Nw=58714 PP=1.03 PPwp=0.03 Nbo=58713 Noov=105 OOV=0.18%
lmtable class statistics
levels 3
lev 1 entries 1091677 used mem 15.62Mb
lev 2 entries 13524189 used mem 193.47Mb
lev 3 entries 23061450 used mem 153.95Mb
total allocated mem 363.03Mb
total number of get and binary search calls
level 1 get: 58714 bsearch: 0
level 2 get: 58713 bsearch: 117425
level 3 get: 58712 bsearch: 0


In the LM, I have also this empty line

lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n 
complete_fr.truecased_unique_tok_clean.lm | head
 1
 2  \data\
 3  ngram 1=1091677
 4  ngram 2=13524189
 5  ngram 3=23061450
 6
 7  \1-grams:
 8  -7.154682
-0.1456359
 9  -3.339167   !   -1.472732
10  -2.43139   -0.71

but in the phrase training or the perplexity computation with the LM, this does 
not cause any problems.

Also, I'm wondering why there is an entry for an empty line in the LM because I 
checked my french corpus and it does not contain any empty lines.


Best, Patricia








 From: nicr...@fbk.eu
 To: patriciahelm...@hotmail.com
 Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry 
 was found (0) in position 1
 Date: Tue, 3 Jul 2012 14:59:57 +

 Hi Patricia,

 Could you also send me the top 10 lines of your binarized LM?

 head complete_fr.truecased_unique_tok_clean.blm

 Thanks,
 Nick

 
 From: Patricia Helmich [patriciahelm...@hotmail.com]
 Sent: Tuesday, July 03, 2012 4:40 PM
 To: Nicholas Ruiz; moses-support@mit.edu
 Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry 
 was found (0) in position 1

 Hi Nick,

 for

 /home/lingua/smt/irstlm/bin/compile-lm 
 complete_fr.truecased_unique_tok_clean.lm --eval 
 /home/lingua/Patricia/Corpora/Corpora_Eval/devtest/nc-test2007.truecased.tok.fr

 I get the following output:

 inpfile: complete_fr.truecased_unique_tok_clean.lm
 loading up to the LM level 1000 (if any)
 dub: 1000
 Language Model Type of complete_fr.truecased_unique_tok_clean.lm is 1
 \data\
 loadtxt_ram()
 1-grams: reading 1091677 entries
 done level1
 2-grams: reading 13524189 entries
 ..done level2
 3-grams: reading 23061450 entries
 done level3
 done
 OOV code is 218081
 OOV code is 218081
 Start Eval
 OOV code: 218081
 %% Nw=58714 PP=201.88 PPwp=5.70 Nbo=19233 Noov=105 OOV=0.18%
 lmtable class statistics
 levels 3
 lev 1 entries 1091677 used mem 15.62Mb
 lev 2 entries 13524189 used mem 193.47Mb
 lev 3 entries 23061450 used mem 153.95Mb
 total allocated mem 363.03Mb
 total number of get and binary search calls
 level 1 get: 3042 bsearch: 0
 level 2 get: 58713 bsearch

Re: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong entry was found (0) in position 1

2012-07-03 Thread Barry Haddow
Hi Patricia

It looks like you have some odd characters in your corpus - perhaps vertical 
tabs. You could use xxd on the lm file to try to figure out what it is,

cheers - Barry

On Tuesday 03 July 2012 16:46:35 Nicholas Ruiz wrote:
 Hi Patricia,
 
 Unfortunately, I'm not so well versed in SRILM, so I'm not sure I can
  answer the question about the blank line appearing in your ARPA file. You
  can also try training your model directly with IRSTLM (in text format) and
  you can see if the blank line also appears.
 
 tlm -tr=corpus -lm=[wb|msb] -n=3
  -o=complete_fr.truecased_unique_tok_irst.lm
 
 (I'm not sure what you original params were for the SRI model)
 wb=Witten-Bell Smoothing
 msb=Modified Shift-Beta Smoothing
 
 Best,
 Nick
 
 
 From: Patricia Helmich [patriciahelm...@hotmail.com]
 Sent: Tuesday, July 03, 2012 5:38 PM
 To: Nicholas Ruiz
 Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong
  entry was found (0) in position 1
 
 Hi Nick,
 
 ok, here are the first 10 lines of the BLM:
 
 lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n
  complete_fr.truecased_unique_tok_clean.blm | head 1  blmt 3 1091677
  13524189 23061450
  2  1091677
  3
  0
  4  ! 0
  5   0
  6  # 0
  7  $ 0
  8  % 0
  9   0
 10  ' 0
 
 
 
 It seems that the third line causes the problems because I deleted it in a
  copy of the BLM
 
 lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n
  complete_fr.truecased_unique_tok_clean_copy.blm | head 1  blmt 3 1091677
  13524189 23061450
  2  1091677
  3  ! 0
  4   0
  5  # 0
  6  $ 0
  7  % 0
  8   0
  9  ' 0
 10  '00 0
 
 and then I tried to compute the perplexity with the copy of the BLM and it
  worked well:
 
 lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$
  /home/lingua/smt/irstlm/bin/compile-lm
  complete_fr.truecased_unique_tok_clean_copy.blm --eval
  /home/lingua/Patricia/Corpora/Corpora_Eval/devtest/nc-test2007.truecased.t
 ok.fr inpfile: complete_fr.truecased_unique_tok_clean_copy.blm
 loading up to the LM level 1000 (if any)
 dub: 1000
 Language Model Type of complete_fr.truecased_unique_tok_clean_copy.blm is 1
 blmt
 loadbin()
 lmtable::loadbin_dict()
 dict-size(): 1091677
 loadbin_level (level 1)
 loading 1091677 1-grams
 done (level1)
 loadbin_level (level 2)
 loading 13524189 2-grams
 done (level2)
 loadbin_level (level 3)
 loading 23061450 3-grams
 done (level3)
 done
 OOV code is 218080
 Start Eval
 OOV code: 218080
 %% Nw=58714 PP=1.03 PPwp=0.03 Nbo=58713 Noov=105 OOV=0.18%
 lmtable class statistics
 levels 3
 lev 1 entries 1091677 used mem 15.62Mb
 lev 2 entries 13524189 used mem 193.47Mb
 lev 3 entries 23061450 used mem 153.95Mb
 total allocated mem 363.03Mb
 total number of get and binary search calls
 level 1 get: 58714 bsearch: 0
 level 2 get: 58713 bsearch: 117425
 level 3 get: 58712 bsearch: 0
 
 
 In the LM, I have also this empty line
 
 lingua@StatMT24:~/Patricia/Corpora/Corpora_Monoling_Complete/fr$ cat -n
  complete_fr.truecased_unique_tok_clean.lm | head 1
  2  \data\
  3  ngram 1=1091677
  4  ngram 2=13524189
  5  ngram 3=23061450
  6
  7  \1-grams:
  8  -7.154682
 -0.1456359
  9  -3.339167   !   -1.472732
 10  -2.43139   -0.71
 
 but in the phrase training or the perplexity computation with the LM, this
  does not cause any problems.
 
 Also, I'm wondering why there is an entry for an empty line in the LM
  because I checked my french corpus and it does not contain any empty
  lines.
 
 
 Best, Patricia
 
  From: nicr...@fbk.eu
  To: patriciahelm...@hotmail.com
  Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong
  entry was found (0) in position 1 Date: Tue, 3 Jul 2012 14:59:57 +
 
  Hi Patricia,
 
  Could you also send me the top 10 lines of your binarized LM?
 
  head complete_fr.truecased_unique_tok_clean.blm
 
  Thanks,
  Nick
 
  
  From: Patricia Helmich [patriciahelm...@hotmail.com]
  Sent: Tuesday, July 03, 2012 4:40 PM
  To: Nicholas Ruiz; moses-support@mit.edu
  Subject: RE: [Moses-support] IRSTLM - Error: dictionary::loadtxt wrong
  entry was found (0) in position 1
 
  Hi Nick,
 
  for
 
  /home/lingua/smt/irstlm/bin/compile-lm
  complete_fr.truecased_unique_tok_clean.lm --eval
  /home/lingua/Patricia/Corpora/Corpora_Eval/devtest/nc-test2007.truecased.
 tok.fr
 
  I get the following output:
 
  inpfile: complete_fr.truecased_unique_tok_clean.lm
  loading up to the LM level 1000 (if any)
  dub: 1000
  Language Model Type of complete_fr.truecased_unique_tok_clean.lm is 1
  \data\
  loadtxt_ram()
  1-grams: reading 1091677 entries
  done level1
  2-grams: reading 13524189 entries
  ..done level2
  3-grams: reading 23061450 entries
  done level3
  done
  OOV code is 218081
  OOV code is 218081

Re: [Moses-support] irstlm how to use caching feature?

2012-04-20 Thread Nicola Bertoldi
Dear Somayeh,

I am answering to the same email you sent to the IRSTLM support list
user-irs...@list.fbk.eumailto:user-irs...@list.fbk.eu

because it is more appropriate for you problem.

best
Nicola


On Apr 20, 2012, at 11:35 AM, somayeh bakhshaei wrote:

Hello all,

I am trying to use caching option of Irstlm ,
So I have config it with -enable-caching

the lm is made,

then I tried to compile and change it to ARPA format:

compile-lm lm --text yes out

but it gives this error:

Reading /Share/local/bakhshaei/ITRC/en-fr/cacheLm/integratedv0.4-cache.gz...
iARPA
loadtxt()
1-grams: reading 299158 entries
line=-5.776930
compile-lm: lmtable.cpp:237: int parseline(std::istream, int, ngram, float, 
float): Assertion `howmany == (Order+ 1) || howmany == (Order + 2)' failed.
Aborted


Is it possible to tell me what is wrong please?


-
Best Regards,
S.Bakhshaei

After All you will come 
And will spread light on the dark desolate world!
O' Kind Father! We will be waiting for your affectionate hands ...

___
Moses-support mailing list
Moses-support@mit.edumailto:Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] IRSTLM error: converting iARPA to ARPA format

2010-04-21 Thread Lee Ball (Applied Language)
Hello Zahurul,

Have you tried the latest release of IRSTLM? It is currently at 5.40.01
which is available from here:

http://hlt.fbk.eu/en/irstlm

Updates since 5.22 are:

B.10 Version 5.30
*Support for a safe management of LMs with a total amount of n-grams larger
than 250 million*
Use of a new parameter to specify a directory for temporary computation
because the default (”/tmp”)
could be too small
Improved a safer method of concatenation of gzipped sub lms
Improved management of log files

B.11 Version 5.40
 Merging of internal-only tlm code into the public version
 Updated documentation into the public version
 Included documentation into the public version

Kind regards,

Lee Ball
Infrastructure Manager
lee.b...@appliedlanguage.com


Applied Language Solutions
High quality language solutions delivered on time
...with a smile!

www.appliedlanguage.com
Tel (UK): +44 (0)845 367 7000
Tel (US): +1 (800) 579-5010

Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ. UK
Registered in the UK 5122429

Pride in everything we do | Respect everyone like a friend
Think of the environment; please don't print this e-mail unless you really
need to.
On 21 April 2010 09:57, Zahurul Islam zai...@gmail.com wrote:

 Hi,
 I am trying to build a language model large amount text (13GB). In the step
 of converting iARPA format to ARPA format i met following error:

 /tools/irstlm-5.22.01/bin/compile-lm wiki.it.truecase.ilm.gz --text yes
 wiki.it.lm
 inpfile: wiki.it.truecase.ilm.gz
 dub: 1000
 Reading wiki.it.truecase.ilm.gz...
 iARPA
 loadtxt()
 terminate called after throwing an instance of 'std::bad_alloc'
   what():  std::bad_alloc
 /tools/irstlm-5.22.01/bin/compile-lm: line 9: 20328 Aborted
 $dir/$name $@

 Any help to identify|solve this problem will be appreciated. Thank you very
 much.

 Regards,
 Zahurul

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] IRSTLM error: converting iARPA to ARPA format

2010-04-21 Thread Miles Osborne
this means you have run out of memory.

you can either:

--get more memory
--use less data
--use a lower-order LM
--use RandLM, which can easily handle this amount of data (i am
currently building LMs using more than 30 billion words with it for
example)

Miles

On 21 April 2010 09:57, Zahurul Islam zai...@gmail.com wrote:
 Hi,
 I am trying to build a language model large amount text (13GB). In the step
 of converting iARPA format to ARPA format i met following error:

 /tools/irstlm-5.22.01/bin/compile-lm wiki.it.truecase.ilm.gz --text yes
 wiki.it.lm
 inpfile: wiki.it.truecase.ilm.gz
 dub: 1000
 Reading wiki.it.truecase.ilm.gz...
 iARPA
 loadtxt()
 terminate called after throwing an instance of 'std::bad_alloc'
   what():  std::bad_alloc
 /tools/irstlm-5.22.01/bin/compile-lm: line 9: 20328 Aborted
 $dir/$name $@
 Any help to identify|solve this problem will be appreciated. Thank you very
 much.
 Regards,
 Zahurul
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support





-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] IRSTLM error: converting iARPA to ARPA format

2010-04-21 Thread Nicola Bertoldi
Dear Zahurul

the newest release of IRSTLM (5.40.01) should solve your problem which is 
probably related to the size.

Please download from here:
http://hlt.fbk.eu/en/irstlm


There is an official mailing list for IRSTLM, you can join from here
https://list.fbk.eu/sympa/subscribe/user-irstlm

The mail address to submit your question is:
user-irstlm   AT   list  DOT  fbk  DOT  eu


Nicola

On 4/21/10 10:57 AM, Zahurul Islam zai...@gmail.com wrote:

Hi,
I am trying to build a language model large amount text (13GB). In the step of 
converting iARPA format to ARPA format i met following error:

/tools/irstlm-5.22.01/bin/compile-lm wiki.it.truecase.ilm.gz --text yes 
wiki.it.lm
inpfile: wiki.it.truecase.ilm.gz
dub: 1000
Reading wiki.it.truecase.ilm.gz...
iARPA
loadtxt()
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
/tools/irstlm-5.22.01/bin/compile-lm: line 9: 20328 Aborted 
$dir/$name $@

Any help to identify|solve this problem will be appreciated. Thank you very 
much.

Regards,
Zahurul



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] irstlm toolkit

2009-05-06 Thread Marcin Miłkowski
m...@bezeqint.net pisze:
 hello
 i found out that you can create huge language models using
 IRSTLM toolkit :
 inside the moses tutorial i found out example : 
 
 build-lm.sh -i gunzip -c corpus.gz -n 3 -o train.irstlm.gz -k 10
 
 until now i used the srilim toolkit added with 2 flags : 
 -interpolate -kndiscount
 
 does irstlm can produce the same result like srilim (with the
 flags) ?

If you mean the standard ARPA format, no, it doesn't. But there is a 
command (compile-lm, look in irstlm docs) to create an ARPA-formatted 
LM, if you really need it but Moses does support IRSTM files natively, 
so you shouldn't need it.


Regards
Marcin
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] IRSTLM compile failure

2009-03-12 Thread Tom Hoar
Hi Laxmi,

I eliminated VMware as a problem. I built a hardware box with Ubuntu 8.10
and I have the same problem.

I can not build IRSTLM. Compiling IRSTLM fails with the same message below
on a new Ubuntu 8.10 hardware machine.

Does anyone have a complete list of all dependencies necessary to build
IRSTLM?

Thanks,
Tom

On Wed, Mar 11, 2009 at 11:14 PM, moses-support-requ...@mit.edu wrote:


   1. Re: Moses-support Digest, Vol 29, Issue 10 (Laxmi Khatiwada)


 --

 Message: 1
 Date: Wed, 11 Mar 2009 11:26:02 +0545
 From: Laxmi Khatiwada lkhatiw...@gmail.com
 Subject: Re: [Moses-support] Moses-support Digest, Vol 29, Issue 10
 To: moses-support@mit.edu
 Message-ID:
ae3696650903102241w5720c17ated4b90c971ba7...@mail.gmail.com
 Content-Type: text/plain; charset=utf-8

 Hi mosesians

 I am also facing same problem. I could configure machine type in srilm make
 file.

 But I am unknown about VMware player. Is it must necessary to run srilm or
 moses?

 Laxmi



 On Tue, Mar 10, 2009 at 10:04 PM, moses-support-requ...@mit.edu wrote:

  Today's Topics:
 
1. IRSTLM compile failure (Tom Hoar)
 
 
  --
 
  Message: 1
  Date: Mon, 9 Mar 2009 23:32:48 +0700
  From: Tom Hoar tah...@gmail.com
  Subject: [Moses-support] IRSTLM compile failure
  To: moses-support@mit.edu
  Message-ID:
 608377160903090932h7a6e1d06m9ce85a487238f...@mail.gmail.com
  Content-Type: text/plain; charset=iso-8859-1
 
  I can't seem to find a support contact for IRSTLM, so I'd appreciate if
  someone can please point me in the right direction.
 
  I've set up an Ubuntu 8.10 system, GNU Make 3.81, VMware hardware version
 4
  (virtual machine). I have collected all dependencies and can compile
 Moses
  Decoder with GIZA++ and SRILM to make a working system. I'm now starting
  with a new copy of Moses source and trying to compile it with IRSTLM.
 
  Thanks in advance for any help you can offer, or pointing me in the right
  direction.
  Best regards,
  Tom
 
  Steps:
  1) sudo svn co https://irstlm.svn.sourceforge.net/svnroot/irstlm irstlm
  2) cd irstlm
  3) sudo ./install
 
  * Output *
  MACHTYPE is actually undefined
  Set environment variable MACHTYPE with uname -m
  OSTYPE is actually undefined
  Set environment variable OSTYPE with uname -s
  CREATING DIRECTORIES
  CREATING ALIASES FOR OTHER MACHINE TYPES
  COMPILING CODE
  rm *.o
  rm: cannot remove `*.o': No such file or directory
  make: *** [clean] Error 1
  cc  -O3 -DMYCODESIZE=3  -Wall --no-strict-aliasing  -c -o
  cmd.o cmd.c
  g++ -static -c  -O3 -DMYCODESIZE=3  -Wall --no-strict-aliasing util.cpp
 -o
  util.o
  util.cpp: In function 'void createtempfile(std::ofstream, std::string,
  std::_Ios_Openmode)':
  util.cpp:49: warning: ignoring return value of 'int mkstemp(char*)',
  declared with attribute warn_unused_result
  util.cpp: In function 'void removefile(const std::string)':
  util.cpp:62: warning: ignoring return value of 'int system(const char*)',
  declared with attribute warn_unused_result
  g++ -static -c  -O3 -DMYCODESIZE=3  -Wall --no-strict-aliasing
 mempool.cpp
  -o mempool.o
  g++ -static -c  -O3 -DMYCODESIZE=3  -Wall --no-strict-aliasing htable.cpp
  -o
  htable.o
  g++ -static -c  -O3 -DMYCODESIZE=3  -Wall --no-strict-aliasing
  dictionary.cpp -o dictionary.o
  In file included from dictionary.cpp:23:
  mfstream.h: In member function 'virtual int
  fdbuf::underflow()':mfstream.h:98: error: 'memmove' is not a member of
  'std'
  In file included from dictionary.cpp:31:
  dictionary.h: In member function 'char* dictionary::OOV()':
  dictionary.h:89:
  warning: deprecated conversion from string constant to 'char*'
  dictionary.h: In member function 'char* dictionary::BoS()':
  dictionary.h:90:
  warning: deprecated conversion from string constant to 'char*'
  dictionary.h: In member function 'char* dictionary::EoS()':
  dictionary.h:91:
  warning: deprecated conversion from string constant to 'char*'
  make: *** [dictionary.o] Error 1
  INSTALLING INCLUDE DIR
  INSTALLING SCRIPTS
  INSTALLING ARCHITECTURE-INDEPENDENT WRAPPERS
  * end Output *
 
  * Results *
  Created: /usr/local/src/irstlm/bin, contains i686 and scripts
  Created: /usr/local/src/irstlm/bin/i686, empty
  Created: /usr/local/src/irstlm/lib, contains i686 and no files
  Created: /usr/local/src/irstlm/lib/i686, empty
  Created: /usr/local/src/irstlm/include, has .h files
  * end Results *
 
  Other:
  uname -m == i686
  uname -s == Linux
  uname -r == 2.6.27-11-generic
  -- next part --
  An HTML attachment was scrubbed...
  URL:
 
 http://mailman.mit.edu/mailman/private/moses-support/attachments/20090309/e98592c0/attachment-0001.htm
 
  --
 
  ___
  Moses-support mailing list
  Moses-support@mit.edu
  

Re: [Moses-support] [IRSTLM] No output file after build-lm.sh

2008-07-25 Thread Miguel José Hernández Vidal
Hi,

I agree with you, it's kind of weird. As you said, I used compile-lm 
in order to have my SRI language model in a binary format. My first 
attemp was to run the decoder compiled with IRSTLM, but I had the 
segmentation fault error.

Then I ran the decoder compiled with SRILM with the following: 0 0 5 
/home/esca/ESCA/lm/ca.blm. I managed to run the decoder, but the 
translation wasn't good at all. It was my mistake, as it seems that a 
language model in IRST binary format wasn't supposed to work on a 
decoder compiled with SRILM.

My last attemp was to use SRI's binary format. According to SRI's FAQ 
(http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html) I 
rebuilt my language model with the command ngram-count ... -lm /NEWLM/ 
-write-binary-lm. I managed to run the decoder and the output was ok. 
Its size was lower than the unbinarized SRI language model, and its 
loading time was lower, but it still took an average of 7 seconds on a 
quad core system.

My target was to minimize the load of the translation and reordering 
tables and the LM. The first ones load almost instantly, but not the LM.

I would like to know which are the differences between IRST and SRI 
binary formats, and if there's a better one. I first had the idea that 
the only way to have a binary LM was to use IRST tools, as SRI's method 
isn't mentioned on Moses documentation. Is there any reason to not 
having to use SRI binary format?

Thanks for your help.

Regards,

Miguel

Philipp Koehn wrote:
 Hi,

 this is very weird. You are using the 'irstlm/src/compile-lm' command, are 
 you?
 I was first a bit confused (actually still am), because there is also
 a SRILM binary
 format.

 -phi

 On Wed, Jul 23, 2008 at 10:50 AM, Miguel José Hernández Vidal
 [EMAIL PROTECTED] wrote:
   
 Hi Philipp,

 Thanks for your advice. Maybe I've done something wrong, although I followed
 Moses' documentation guidelines.

 First, I compiled separately a new Moses environment '--with-irstlm'.
 Next I ran the following in order to have a binarized version of my SRI
 language model:
   $ ./compile-lm corpus.ca.lm ca.blm

 Then I updated my moses.ini with the new settings:
   1 0 5 /home/esca/ESCA/lm/ca.blm

 At last, I ran moses compiled with irstlm version and I had the
 'segmentation fault' error.


 I managed to run the binarized SRI model in the following way:

 After 'compile-lm' I updated moses.ini:
   0 0 5 /home/esca/ESCA/lm/ca.blm

 And then I ran moses (compiled with SRILM) without any errors.


 I thought binarized language models had to be decoded with the IRST compiled
 version of Moses. Am I wrong?

 Regards,
 Miguel

 Philipp Koehn wrote:
 
 Hi,

 To use the binarized IRST LM, you just need to compile the SRILM LM,
 no need to train the model with IRST tools. See Moses documentation
 for details.

 -phi

 On Tue, Jul 22, 2008 at 12:31 PM, Miguel José Hernández Vidal
 [EMAIL PROTECTED] wrote:

   
 I've also tried to run moses with a binarized (with compile-lm) SRI
 language model. When I run the decoder I see a segmentation fault error:


 ---
 [EMAIL PROTECTED]:~$ ~/moses/moses-cmd/src/moses -config 
 ~/ESCA/model/moses.ini
 -input-file ~/ESCA/tuning/input  ~/ESCA/evaluation/output
 Defined parameters (per moses.ini or switch):
   config: /home/esca/ESCA/model/moses.ini
   distortion-file: 0-0 msd-bidirectional-fe 6
 /home/esca/ESCA/model/reordering
   distortion-limit: 6
   input-factors: 0
   input-file: /home/esca/ESCA/tuning/input
   lmodel-file: 1 0 5 /home/esca/ESCA/lm/ca.blm
   mapping: 0 T 0
   ttable-file: 0 0 5 /home/esca/ESCA/model/phrase-table
   ttable-limit: 20
   weight-d: 0.3 0.3 0.3 0.3 0.3 0.3 0.3
   weight-l: 0.5000
   weight-t: 0.2 0.2 0.2 0.2 0.2
   weight-w: -1
 Loading lexical distortion models...
 have 1 models
 Creating lexical reordering...
 weights: 0.300 0.300 0.300 0.300 0.300 0.300
 binary file loaded, default OFF_T: -1
 Created lexical orientation reordering
 Start loading LanguageModel /home/esca/ESCA/lm/ca.blm : [1.000] seconds
 In LanguageModelIRST::Load: nGramOrder = 5
 Loading LM file (no MAP)
 blmt
 loadbin()
 loading 321187 1-grams
 loading 4548952 2-grams
 loading 2785668 3-grams
 loading 2501764 4-grams
 loading 1741048 5-grams
 done
 OOV code is 37189
 IRST: m_unknownId=37189
 Fallo de segmentación (core dumped) #SEGMENTATION FAULT

 

 I am using binarized phrase and reordering tables, but they worked fine
 when I build them with my old SRILM system.

 Thanks