I've used gcc in cygwin to compile both Moses and IRSTLM. But as you and
Barry pointed out I'll try to use kenlm (can't use srilm due to licensing
restrictions) and if that doesn't work try srilm at my college.

I think the segfault occurs in Hypothesis.cpp at:

*m_ffStates[i] = ffs[i]->Evaluate(
            *this,
            m_prevHypo ? m_prevHypo->m_ffStates[i] : NULL,
            &m_scoreBreakdown);

*May be, it gives some clue in identifying the issue.

Thanks again  --Sudip.

On Fri, Mar 25, 2011 at 3:51 PM, Hieu Hoang <hieuho...@gmail.com> wrote:

> If you've compiled with gcc in cygwin, you can use any lm. The
> stipulation of using only the internal lm only applies Iif you use
> visual studio.
>
> However, I would personally use srilm to start with as I'm not sure if
> the other lm are fully tested on cygwin
>
> Hieu
> Sent from my flying horse
>
> On 25 Mar 2011, at 10:06 AM, Barry Haddow <bhad...@inf.ed.ac.uk> wrote:
>
> > Hi Sudip
> >
> > If you're using windows, then you should use the internal LM. See here:
> > http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
> > afaik this is still the case.
> >
> > Also, there are a couple of odd things in your setup. Firstly, you've
> built a
> > 3-gram LM, but you're telling moses that it's 2-gram:
> >> [lmodel-file]
> >> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> > This shouldn't matter, but just in case you're unaware.
> >
> > Also, both the words in your input sentence are unknown. Did the phrase
> table
> > build OK? Maybe you could use zless or zcat to extract and post the first
> few
> > lines of it,
> >
> > best regards - Barry
> >
> > On Friday 25 March 2011 08:13, Sudip Datta wrote:
> >> Hi,
> >>
> >> I am a noob at using Moses and have been trying to build a model and
> then
> >> use the decoder to translate test sentences. I used the following
> command
> >> for training:
> >>
> >> * train-model.perl --root-dir /cygdrive/d/moses/fi-en/**fienModel/
> --corpus
> >> /cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm
> >> 0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1*
> >>
> >> The process ended cleanly with the following moses.ini file:
> >>
> >> *# input factors
> >> [input-factors]
> >> 0
> >>
> >> # mapping steps
> >> [mapping]
> >> 0 T 0
> >>
> >> # translation tables: table type (hierarchical(0), textual (0), binary
> >> (1)), source-factors, target-factors, number of scores, file
> >> # OLD FORMAT is still handled for back-compatibility
> >> # OLD FORMAT translation tables: source-factors, target-factors, number
> of
> >> scores, file
> >> # OLD FORMAT a binary table type (1) is assumed
> >> [ttable-file]
> >> 0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
> >>
> >> # no generation models, no generation-file section
> >>
> >> # language models: type(srilm/irstlm), factors, order, file
> >> [lmodel-file]
> >> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> >>
> >>
> >> # limit on how many phrase translations e for each phrase f are loaded
> >> # 0 = all elements loaded
> >> [ttable-limit]
> >> 20
> >>
> >> # distortion (reordering) weight
> >> [weight-d]
> >> 0.6
> >>
> >> # language model weights
> >> [weight-l]
> >> 0.5000
> >>
> >>
> >> # translation model weights
> >> [weight-t]
> >> 0.2
> >> 0.2
> >> 0.2
> >> 0.2
> >> 0.2
> >>
> >> # no generation models, no weight-generation section
> >>
> >> # word penalty
> >> [weight-w]
> >> -1
> >>
> >> [distortion-limit]
> >> 6*
> >>
> >> But the decoding step ends with a segfault with following output for -v
> 3:
> >>
> >> *Defined parameters (per moses.ini or switch):
> >>        config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini
> >>        distortion-limit: 6
> >>        input-factors: 0
> >>        lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> >>        mapping: 0 T 0
> >>        ttable-file: 0 0 0 5
> >> /cygdrive/d/moses/fi-en/fienModel//model/phrase-tab
> >> le.gz
> >>        ttable-limit: 20
> >>        verbose: 100
> >>        weight-d: 0.6
> >>        weight-l: 0.5000
> >>        weight-t: 0.2 0.2 0.2 0.2 0.2
> >>        weight-w: -1
> >> input type is: text input
> >> Loading lexical distortion models...have 0 models
> >> Start loading LanguageModel /cygdrive/d/moses/fi-en/en.irstlm.gz :
> [0.000]
> >> secon
> >> ds
> >> In LanguageModelIRST::Load: nGramOrder = 2
> >> Loading LM file (no MAP)
> >> iARPA
> >> loadtxt()
> >> 1-grams: reading 3195 entries
> >> 2-grams: reading 13313 entries
> >> 3-grams: reading 20399 entries
> >> done
> >> OOV code is 3194
> >> OOV code is 3194
> >> IRST: m_unknownId=3194
> >> creating cache for storing prob, state and statesize of ngrams
> >> Finished loading LanguageModels : [1.000] seconds
> >> About to LoadPhraseTables
> >> Start loading PhraseTable
> >> /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.
> >> gz : [1.000] seconds
> >> filePath: /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
> >> using standard phrase tables
> >> PhraseDictionaryMemory: input=FactorMask<0>  output=FactorMask<0>
> >> Finished loading phrase tables : [1.000] seconds
> >> IO from STDOUT/STDIN
> >> Created input-output object : [1.000] seconds
> >> The score component vector looks like this:
> >> Distortion
> >> WordPenalty
> >> !UnknownWordPenalty
> >> LM_2gram
> >> PhraseModel_1
> >> PhraseModel_2
> >> PhraseModel_3
> >> PhraseModel_4
> >> PhraseModel_5
> >> Stateless: 1    Stateful: 2
> >> The global weight vector looks like this: 0.600 -1.000 1.000 0.500 0.200
> >> 0.200
> >> 0
> >> .200 0.200 0.200
> >> Translating: istuntokauden uudelleenavaaminen
> >>
> >> DecodeStep():
> >>        outputFactors=FactorMask<0>
> >>        conflictFactors=FactorMask<>
> >>        newOutputFactors=FactorMask<0>
> >> Translation Option Collection
> >>
> >>       Total translation options: 2
> >> Total translation options pruned: 0
> >> translation options spanning from  0 to 0 is 1
> >> translation options spanning from  0 to 1 is 0
> >> translation options spanning from  1 to 1 is 1
> >> translation options generated in total: 2
> >> future cost from 0 to 0 is -100.136
> >> future cost from 0 to 1 is -200.271
> >> future cost from 1 to 1 is -100.136
> >> Collecting options took 0.000 seconds
> >> added hyp to stack, best on stack, now size 1
> >> processing hypothesis from next stack
> >>
> >> creating hypothesis 1 from 0 ( ... )
> >>        base score 0.000
> >>        covering 0-0: istuntokauden
> >>        translated as: istuntokauden|UNK|UNK|UNK
> >>        score -100.136 + future cost -100.136 = -200.271
> >>        unweighted feature scores: <<0.000, -1.000, -100.000, -2.271,
> >> 0.000, 0.0
> >> 00, 0.000, 0.000, 0.000>>
> >> added hyp to stack, best on stack, now size 1
> >> Segmentation fault (core dumped)*
> >>
> >> The only suspicious thing I found in above is the message '*creating
> >> hypothesis 1 from 0*', but neither I know if it is the actual problem
> and
> >> why is it happening. I believe that problem is with the training step
> since
> >> the samples models that I downloaded from
> >> http://www.statmt.org/moses/download/sample-models.tgz work fine.
> >>
> >> Prior to this, I constructed an IRST LM an used clean-corpus-n.perl for
> >> cleaning the decoder input. Looking at the archives, the closest message
> I
> >> could find was http://thread.gmane.org/gmane.comp.nlp.moses.user/1478but I
> >> don't think I'm committing the same mistake as the author of that
> message.
> >>
> >> I'll be delighted if anybody could provide any insights in this problem
> or
> >> requires me to provide any further details.
> >>
> >> Thanks,
> >>
> >> --Sudip.
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> >
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to