If you've compiled with gcc in cygwin, you can use any lm. The
stipulation of using only the internal lm only applies Iif you use
visual studio.

However, I would personally use srilm to start with as I'm not sure if
the other lm are fully tested on cygwin

Hieu
Sent from my flying horse

On 25 Mar 2011, at 10:06 AM, Barry Haddow <bhad...@inf.ed.ac.uk> wrote:

> Hi Sudip
>
> If you're using windows, then you should use the internal LM. See here:
> http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
> afaik this is still the case.
>
> Also, there are a couple of odd things in your setup. Firstly, you've built a
> 3-gram LM, but you're telling moses that it's 2-gram:
>> [lmodel-file]
>> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> This shouldn't matter, but just in case you're unaware.
>
> Also, both the words in your input sentence are unknown. Did the phrase table
> build OK? Maybe you could use zless or zcat to extract and post the first few
> lines of it,
>
> best regards - Barry
>
> On Friday 25 March 2011 08:13, Sudip Datta wrote:
>> Hi,
>>
>> I am a noob at using Moses and have been trying to build a model and then
>> use the decoder to translate test sentences. I used the following command
>> for training:
>>
>> * train-model.perl --root-dir /cygdrive/d/moses/fi-en/**fienModel/ --corpus
>> /cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm
>> 0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1*
>>
>> The process ended cleanly with the following moses.ini file:
>>
>> *# input factors
>> [input-factors]
>> 0
>>
>> # mapping steps
>> [mapping]
>> 0 T 0
>>
>> # translation tables: table type (hierarchical(0), textual (0), binary
>> (1)), source-factors, target-factors, number of scores, file
>> # OLD FORMAT is still handled for back-compatibility
>> # OLD FORMAT translation tables: source-factors, target-factors, number of
>> scores, file
>> # OLD FORMAT a binary table type (1) is assumed
>> [ttable-file]
>> 0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
>>
>> # no generation models, no generation-file section
>>
>> # language models: type(srilm/irstlm), factors, order, file
>> [lmodel-file]
>> 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
>>
>>
>> # limit on how many phrase translations e for each phrase f are loaded
>> # 0 = all elements loaded
>> [ttable-limit]
>> 20
>>
>> # distortion (reordering) weight
>> [weight-d]
>> 0.6
>>
>> # language model weights
>> [weight-l]
>> 0.5000
>>
>>
>> # translation model weights
>> [weight-t]
>> 0.2
>> 0.2
>> 0.2
>> 0.2
>> 0.2
>>
>> # no generation models, no weight-generation section
>>
>> # word penalty
>> [weight-w]
>> -1
>>
>> [distortion-limit]
>> 6*
>>
>> But the decoding step ends with a segfault with following output for -v 3:
>>
>> *Defined parameters (per moses.ini or switch):
>>        config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini
>>        distortion-limit: 6
>>        input-factors: 0
>>        lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
>>        mapping: 0 T 0
>>        ttable-file: 0 0 0 5
>> /cygdrive/d/moses/fi-en/fienModel//model/phrase-tab
>> le.gz
>>        ttable-limit: 20
>>        verbose: 100
>>        weight-d: 0.6
>>        weight-l: 0.5000
>>        weight-t: 0.2 0.2 0.2 0.2 0.2
>>        weight-w: -1
>> input type is: text input
>> Loading lexical distortion models...have 0 models
>> Start loading LanguageModel /cygdrive/d/moses/fi-en/en.irstlm.gz : [0.000]
>> secon
>> ds
>> In LanguageModelIRST::Load: nGramOrder = 2
>> Loading LM file (no MAP)
>> iARPA
>> loadtxt()
>> 1-grams: reading 3195 entries
>> 2-grams: reading 13313 entries
>> 3-grams: reading 20399 entries
>> done
>> OOV code is 3194
>> OOV code is 3194
>> IRST: m_unknownId=3194
>> creating cache for storing prob, state and statesize of ngrams
>> Finished loading LanguageModels : [1.000] seconds
>> About to LoadPhraseTables
>> Start loading PhraseTable
>> /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.
>> gz : [1.000] seconds
>> filePath: /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
>> using standard phrase tables
>> PhraseDictionaryMemory: input=FactorMask<0>  output=FactorMask<0>
>> Finished loading phrase tables : [1.000] seconds
>> IO from STDOUT/STDIN
>> Created input-output object : [1.000] seconds
>> The score component vector looks like this:
>> Distortion
>> WordPenalty
>> !UnknownWordPenalty
>> LM_2gram
>> PhraseModel_1
>> PhraseModel_2
>> PhraseModel_3
>> PhraseModel_4
>> PhraseModel_5
>> Stateless: 1    Stateful: 2
>> The global weight vector looks like this: 0.600 -1.000 1.000 0.500 0.200
>> 0.200
>> 0
>> .200 0.200 0.200
>> Translating: istuntokauden uudelleenavaaminen
>>
>> DecodeStep():
>>        outputFactors=FactorMask<0>
>>        conflictFactors=FactorMask<>
>>        newOutputFactors=FactorMask<0>
>> Translation Option Collection
>>
>>       Total translation options: 2
>> Total translation options pruned: 0
>> translation options spanning from  0 to 0 is 1
>> translation options spanning from  0 to 1 is 0
>> translation options spanning from  1 to 1 is 1
>> translation options generated in total: 2
>> future cost from 0 to 0 is -100.136
>> future cost from 0 to 1 is -200.271
>> future cost from 1 to 1 is -100.136
>> Collecting options took 0.000 seconds
>> added hyp to stack, best on stack, now size 1
>> processing hypothesis from next stack
>>
>> creating hypothesis 1 from 0 ( ... )
>>        base score 0.000
>>        covering 0-0: istuntokauden
>>        translated as: istuntokauden|UNK|UNK|UNK
>>        score -100.136 + future cost -100.136 = -200.271
>>        unweighted feature scores: <<0.000, -1.000, -100.000, -2.271,
>> 0.000, 0.0
>> 00, 0.000, 0.000, 0.000>>
>> added hyp to stack, best on stack, now size 1
>> Segmentation fault (core dumped)*
>>
>> The only suspicious thing I found in above is the message '*creating
>> hypothesis 1 from 0*', but neither I know if it is the actual problem and
>> why is it happening. I believe that problem is with the training step since
>> the samples models that I downloaded from
>> http://www.statmt.org/moses/download/sample-models.tgz work fine.
>>
>> Prior to this, I constructed an IRST LM an used clean-corpus-n.perl for
>> cleaning the decoder input. Looking at the archives, the closest message I
>> could find was http://thread.gmane.org/gmane.comp.nlp.moses.user/1478 but I
>> don't think I'm committing the same mistake as the author of that message.
>>
>> I'll be delighted if anybody could provide any insights in this problem or
>> requires me to provide any further details.
>>
>> Thanks,
>>
>> --Sudip.
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to