I haven't tested kenlm on Cygwin, but it could work.  Can you run tests?

1) Install Boost.  Cygwin's package manager should provide it.

2) Run kenlm tests.

wget http://kheafield.com/code/kenlm.tar.gz
tar xzf kenlm.tar.gz
cd kenlm
./test.sh

On 03/25/11 06:44, Sudip Datta wrote:
> I've used gcc in cygwin to compile both Moses and IRSTLM. But as you and
> Barry pointed out I'll try to use kenlm (can't use srilm due to
> licensing restrictions) and if that doesn't work try srilm at my college.
> 
> I think the segfault occurs in Hypothesis.cpp at:
> 
> /m_ffStates[i] = ffs[i]->Evaluate(
>             *this,
>             m_prevHypo ? m_prevHypo->m_ffStates[i] : NULL,
>             &m_scoreBreakdown);
> 
> /May be, it gives some clue in identifying the issue.
> 
> Thanks again  --Sudip.
> 
> On Fri, Mar 25, 2011 at 3:51 PM, Hieu Hoang <hieuho...@gmail.com
> <mailto:hieuho...@gmail.com>> wrote:
> 
>     If you've compiled with gcc in cygwin, you can use any lm. The
>     stipulation of using only the internal lm only applies Iif you use
>     visual studio.
> 
>     However, I would personally use srilm to start with as I'm not sure if
>     the other lm are fully tested on cygwin
> 
>     Hieu
>     Sent from my flying horse
> 
>     On 25 Mar 2011, at 10:06 AM, Barry Haddow <bhad...@inf.ed.ac.uk
>     <mailto:bhad...@inf.ed.ac.uk>> wrote:
> 
>     > Hi Sudip
>     >
>     > If you're using windows, then you should use the internal LM. See
>     here:
>     > http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
>     > afaik this is still the case.
>     >
>     > Also, there are a couple of odd things in your setup. Firstly,
>     you've built a
>     > 3-gram LM, but you're telling moses that it's 2-gram:
>     >> [lmodel-file]
>     >> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
>     > This shouldn't matter, but just in case you're unaware.
>     >
>     > Also, both the words in your input sentence are unknown. Did the
>     phrase table
>     > build OK? Maybe you could use zless or zcat to extract and post
>     the first few
>     > lines of it,
>     >
>     > best regards - Barry
>     >
>     > On Friday 25 March 2011 08:13, Sudip Datta wrote:
>     >> Hi,
>     >>
>     >> I am a noob at using Moses and have been trying to build a model
>     and then
>     >> use the decoder to translate test sentences. I used the following
>     command
>     >> for training:
>     >>
>     >> * train-model.perl --root-dir
>     /cygdrive/d/moses/fi-en/**fienModel/ --corpus
>     >> /cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm
>     >> 0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1*
>     >>
>     >> The process ended cleanly with the following moses.ini file:
>     >>
>     >> *# input factors
>     >> [input-factors]
>     >> 0
>     >>
>     >> # mapping steps
>     >> [mapping]
>     >> 0 T 0
>     >>
>     >> # translation tables: table type (hierarchical(0), textual (0),
>     binary
>     >> (1)), source-factors, target-factors, number of scores, file
>     >> # OLD FORMAT is still handled for back-compatibility
>     >> # OLD FORMAT translation tables: source-factors, target-factors,
>     number of
>     >> scores, file
>     >> # OLD FORMAT a binary table type (1) is assumed
>     >> [ttable-file]
>     >> 0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
>     >>
>     >> # no generation models, no generation-file section
>     >>
>     >> # language models: type(srilm/irstlm), factors, order, file
>     >> [lmodel-file]
>     >> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
>     >>
>     >>
>     >> # limit on how many phrase translations e for each phrase f are
>     loaded
>     >> # 0 = all elements loaded
>     >> [ttable-limit]
>     >> 20
>     >>
>     >> # distortion (reordering) weight
>     >> [weight-d]
>     >> 0.6
>     >>
>     >> # language model weights
>     >> [weight-l]
>     >> 0.5000
>     >>
>     >>
>     >> # translation model weights
>     >> [weight-t]
>     >> 0.2
>     >> 0.2
>     >> 0.2
>     >> 0.2
>     >> 0.2
>     >>
>     >> # no generation models, no weight-generation section
>     >>
>     >> # word penalty
>     >> [weight-w]
>     >> -1
>     >>
>     >> [distortion-limit]
>     >> 6*
>     >>
>     >> But the decoding step ends with a segfault with following output
>     for -v 3:
>     >>
>     >> *Defined parameters (per moses.ini or switch):
>     >>        config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini
>     >>        distortion-limit: 6
>     >>        input-factors: 0
>     >>        lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
>     >>        mapping: 0 T 0
>     >>        ttable-file: 0 0 0 5
>     >> /cygdrive/d/moses/fi-en/fienModel//model/phrase-tab
>     >> le.gz
>     >>        ttable-limit: 20
>     >>        verbose: 100
>     >>        weight-d: 0.6
>     >>        weight-l: 0.5000
>     >>        weight-t: 0.2 0.2 0.2 0.2 0.2
>     >>        weight-w: -1
>     >> input type is: text input
>     >> Loading lexical distortion models...have 0 models
>     >> Start loading LanguageModel /cygdrive/d/moses/fi-en/en.irstlm.gz
>     : [0.000]
>     >> secon
>     >> ds
>     >> In LanguageModelIRST::Load: nGramOrder = 2
>     >> Loading LM file (no MAP)
>     >> iARPA
>     >> loadtxt()
>     >> 1-grams: reading 3195 entries
>     >> 2-grams: reading 13313 entries
>     >> 3-grams: reading 20399 entries
>     >> done
>     >> OOV code is 3194
>     >> OOV code is 3194
>     >> IRST: m_unknownId=3194
>     >> creating cache for storing prob, state and statesize of ngrams
>     >> Finished loading LanguageModels : [1.000] seconds
>     >> About to LoadPhraseTables
>     >> Start loading PhraseTable
>     >> /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.
>     >> gz : [1.000] seconds
>     >> filePath: /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
>     >> using standard phrase tables
>     >> PhraseDictionaryMemory: input=FactorMask<0>  output=FactorMask<0>
>     >> Finished loading phrase tables : [1.000] seconds
>     >> IO from STDOUT/STDIN
>     >> Created input-output object : [1.000] seconds
>     >> The score component vector looks like this:
>     >> Distortion
>     >> WordPenalty
>     >> !UnknownWordPenalty
>     >> LM_2gram
>     >> PhraseModel_1
>     >> PhraseModel_2
>     >> PhraseModel_3
>     >> PhraseModel_4
>     >> PhraseModel_5
>     >> Stateless: 1    Stateful: 2
>     >> The global weight vector looks like this: 0.600 -1.000 1.000
>     0.500 0.200
>     >> 0.200
>     >> 0
>     >> .200 0.200 0.200
>     >> Translating: istuntokauden uudelleenavaaminen
>     >>
>     >> DecodeStep():
>     >>        outputFactors=FactorMask<0>
>     >>        conflictFactors=FactorMask<>
>     >>        newOutputFactors=FactorMask<0>
>     >> Translation Option Collection
>     >>
>     >>       Total translation options: 2
>     >> Total translation options pruned: 0
>     >> translation options spanning from  0 to 0 is 1
>     >> translation options spanning from  0 to 1 is 0
>     >> translation options spanning from  1 to 1 is 1
>     >> translation options generated in total: 2
>     >> future cost from 0 to 0 is -100.136
>     >> future cost from 0 to 1 is -200.271
>     >> future cost from 1 to 1 is -100.136
>     >> Collecting options took 0.000 seconds
>     >> added hyp to stack, best on stack, now size 1
>     >> processing hypothesis from next stack
>     >>
>     >> creating hypothesis 1 from 0 ( ... )
>     >>        base score 0.000
>     >>        covering 0-0: istuntokauden
>     >>        translated as: istuntokauden|UNK|UNK|UNK
>     >>        score -100.136 + future cost -100.136 = -200.271
>     >>        unweighted feature scores: <<0.000, -1.000, -100.000, -2.271,
>     >> 0.000, 0.0
>     >> 00, 0.000, 0.000, 0.000>>
>     >> added hyp to stack, best on stack, now size 1
>     >> Segmentation fault (core dumped)*
>     >>
>     >> The only suspicious thing I found in above is the message '*creating
>     >> hypothesis 1 from 0*', but neither I know if it is the actual
>     problem and
>     >> why is it happening. I believe that problem is with the training
>     step since
>     >> the samples models that I downloaded from
>     >> http://www.statmt.org/moses/download/sample-models.tgz work fine.
>     >>
>     >> Prior to this, I constructed an IRST LM an used
>     clean-corpus-n.perl for
>     >> cleaning the decoder input. Looking at the archives, the closest
>     message I
>     >> could find was
>     http://thread.gmane.org/gmane.comp.nlp.moses.user/1478 but I
>     >> don't think I'm committing the same mistake as the author of that
>     message.
>     >>
>     >> I'll be delighted if anybody could provide any insights in this
>     problem or
>     >> requires me to provide any further details.
>     >>
>     >> Thanks,
>     >>
>     >> --Sudip.
>     >
>     > --
>     > The University of Edinburgh is a charitable body, registered in
>     > Scotland, with registration number SC005336.
>     >
>     > _______________________________________________
>     > Moses-support mailing list
>     > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>     > http://mailman.mit.edu/mailman/listinfo/moses-support
>     >
>     _______________________________________________
>     Moses-support mailing list
>     Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>     http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> 
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to