Hi Barry,

Thanks a lot for the response.

On Fri, Mar 25, 2011 at 3:31 PM, Barry Haddow <bhad...@inf.ed.ac.uk> wrote:

> Hi Sudip
>
> If you're using windows, then you should use the internal LM. See here:
> http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
> afaik this is still the case.
>

I am using Windows only because I've been forced to :(. Shouldn't using it
on Cygwin work the same way as any other Linux distro?


>
> Also, there are a couple of odd things in your setup. Firstly, you've built
> a
> 3-gram LM, but you're telling moses that it's 2-gram:
> > [lmodel-file]
> > 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> This shouldn't matter, but just in case you're unaware.
>
>
All the while I was confused what order represents in --lm
factor:order:filename:type. Thanks for pointing it out that it is the
n-gram.

Also, both the words in your input sentence are unknown. Did the phrase
> table
> build OK? Maybe you could use zless or zcat to extract and post the first
> few
> lines of it,
>
>
The phrase table looks ok. Here are the first few lines:

( CEN ) ei ole pystynyt ||| ( CEN ) has not been ||| 1 0.000542073 1
0.010962 2.718 ||| ||| 1 1
( CEN ) ei ole ||| ( CEN ) has not ||| 1 0.0374029 1 0.010962 2.718 ||| |||
1 1
( CEN ) ja Yhdistyneiden Kansakuntien talouskomission ||| CEN and within the
United Nations Economic ||| 1 0.00255803 1 0.0411325 2.718 ||| ||| 1 1
( CEN ) ja Yhdistyneiden Kansakuntien ||| CEN and within the United Nations
||| 1 0.00639507 1 0.0616988 2.718 ||| ||| 1 1

The missed terms could be because the training data is very small - I was
just trying to get started. I don't think missing terms could result in the
process crashing, since there can always be missing terms even with a large
training data.


> best regards - Barry
>

Thanks and regards,

--Sudip.

>
> On Friday 25 March 2011 08:13, Sudip Datta wrote:
> > Hi,
> >
> > I am a noob at using Moses and have been trying to build a model and then
> > use the decoder to translate test sentences. I used the following command
> > for training:
> >
> > * train-model.perl --root-dir /cygdrive/d/moses/fi-en/**fienModel/
> --corpus
> > /cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm
> > 0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1*
> >
> > The process ended cleanly with the following moses.ini file:
> >
> > *# input factors
> > [input-factors]
> > 0
> >
> > # mapping steps
> > [mapping]
> > 0 T 0
> >
> > # translation tables: table type (hierarchical(0), textual (0), binary
> > (1)), source-factors, target-factors, number of scores, file
> > # OLD FORMAT is still handled for back-compatibility
> > # OLD FORMAT translation tables: source-factors, target-factors, number
> of
> > scores, file
> > # OLD FORMAT a binary table type (1) is assumed
> > [ttable-file]
> > 0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
> >
> > # no generation models, no generation-file section
> >
> > # language models: type(srilm/irstlm), factors, order, file
> > [lmodel-file]
> > 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> >
> >
> > # limit on how many phrase translations e for each phrase f are loaded
> > # 0 = all elements loaded
> > [ttable-limit]
> > 20
> >
> > # distortion (reordering) weight
> > [weight-d]
> > 0.6
> >
> > # language model weights
> > [weight-l]
> > 0.5000
> >
> >
> > # translation model weights
> > [weight-t]
> > 0.2
> > 0.2
> > 0.2
> > 0.2
> > 0.2
> >
> > # no generation models, no weight-generation section
> >
> > # word penalty
> > [weight-w]
> > -1
> >
> > [distortion-limit]
> > 6*
> >
> > But the decoding step ends with a segfault with following output for -v
> 3:
> >
> > *Defined parameters (per moses.ini or switch):
> >         config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini
> >         distortion-limit: 6
> >         input-factors: 0
> >         lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> >         mapping: 0 T 0
> >         ttable-file: 0 0 0 5
> > /cygdrive/d/moses/fi-en/fienModel//model/phrase-tab
> > le.gz
> >         ttable-limit: 20
> >         verbose: 100
> >         weight-d: 0.6
> >         weight-l: 0.5000
> >         weight-t: 0.2 0.2 0.2 0.2 0.2
> >         weight-w: -1
> > input type is: text input
> > Loading lexical distortion models...have 0 models
> > Start loading LanguageModel /cygdrive/d/moses/fi-en/en.irstlm.gz :
> [0.000]
> > secon
> > ds
> > In LanguageModelIRST::Load: nGramOrder = 2
> > Loading LM file (no MAP)
> > iARPA
> > loadtxt()
> > 1-grams: reading 3195 entries
> > 2-grams: reading 13313 entries
> > 3-grams: reading 20399 entries
> > done
> > OOV code is 3194
> > OOV code is 3194
> > IRST: m_unknownId=3194
> > creating cache for storing prob, state and statesize of ngrams
> > Finished loading LanguageModels : [1.000] seconds
> > About to LoadPhraseTables
> > Start loading PhraseTable
> > /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.
> > gz : [1.000] seconds
> > filePath: /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
> > using standard phrase tables
> > PhraseDictionaryMemory: input=FactorMask<0>  output=FactorMask<0>
> > Finished loading phrase tables : [1.000] seconds
> > IO from STDOUT/STDIN
> > Created input-output object : [1.000] seconds
> > The score component vector looks like this:
> > Distortion
> > WordPenalty
> > !UnknownWordPenalty
> > LM_2gram
> > PhraseModel_1
> > PhraseModel_2
> > PhraseModel_3
> > PhraseModel_4
> > PhraseModel_5
> > Stateless: 1    Stateful: 2
> > The global weight vector looks like this: 0.600 -1.000 1.000 0.500 0.200
> > 0.200
> > 0
> > .200 0.200 0.200
> > Translating: istuntokauden uudelleenavaaminen
> >
> > DecodeStep():
> >         outputFactors=FactorMask<0>
> >         conflictFactors=FactorMask<>
> >         newOutputFactors=FactorMask<0>
> > Translation Option Collection
> >
> >        Total translation options: 2
> > Total translation options pruned: 0
> > translation options spanning from  0 to 0 is 1
> > translation options spanning from  0 to 1 is 0
> > translation options spanning from  1 to 1 is 1
> > translation options generated in total: 2
> > future cost from 0 to 0 is -100.136
> > future cost from 0 to 1 is -200.271
> > future cost from 1 to 1 is -100.136
> > Collecting options took 0.000 seconds
> > added hyp to stack, best on stack, now size 1
> > processing hypothesis from next stack
> >
> > creating hypothesis 1 from 0 ( ... )
> >         base score 0.000
> >         covering 0-0: istuntokauden
> >         translated as: istuntokauden|UNK|UNK|UNK
> >         score -100.136 + future cost -100.136 = -200.271
> >         unweighted feature scores: <<0.000, -1.000, -100.000, -2.271,
> > 0.000, 0.0
> > 00, 0.000, 0.000, 0.000>>
> > added hyp to stack, best on stack, now size 1
> > Segmentation fault (core dumped)*
> >
> > The only suspicious thing I found in above is the message '*creating
> > hypothesis 1 from 0*', but neither I know if it is the actual problem and
> > why is it happening. I believe that problem is with the training step
> since
> > the samples models that I downloaded from
> > http://www.statmt.org/moses/download/sample-models.tgz work fine.
> >
> > Prior to this, I constructed an IRST LM an used clean-corpus-n.perl for
> > cleaning the decoder input. Looking at the archives, the closest message
> I
> > could find was http://thread.gmane.org/gmane.comp.nlp.moses.user/1478but I
> > don't think I'm committing the same mistake as the author of that
> message.
> >
> > I'll be delighted if anybody could provide any insights in this problem
> or
> > requires me to provide any further details.
> >
> > Thanks,
> >
> > --Sudip.
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to