Hi Sudip

You're phrase table looks fine, assuming that all the 'CEN' tokens are 
supposed to be there. Unknown words don't  cause moses to crash, but I 
thought they might be symptomatic of some other problem with your phrase 
table. 

Unfortunately, I think you'll either have to switch to linux (can you use a 
live distro, if you can't install one?) or use the internal LM,

best regards - Barry

On Friday 25 March 2011 10:15, Sudip Datta wrote:
> Hi Barry,
>
> Thanks a lot for the response.
>
> On Fri, Mar 25, 2011 at 3:31 PM, Barry Haddow <bhad...@inf.ed.ac.uk> wrote:
> > Hi Sudip
> >
> > If you're using windows, then you should use the internal LM. See here:
> > http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
> > afaik this is still the case.
>
> I am using Windows only because I've been forced to :(. Shouldn't using it
> on Cygwin work the same way as any other Linux distro?
>
> > Also, there are a couple of odd things in your setup. Firstly, you've
> > built a
> >
> > 3-gram LM, but you're telling moses that it's 2-gram:
> > > [lmodel-file]
> > > 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> >
> > This shouldn't matter, but just in case you're unaware.
>
> All the while I was confused what order represents in --lm
> factor:order:filename:type. Thanks for pointing it out that it is the
> n-gram.
>
> Also, both the words in your input sentence are unknown. Did the phrase
>
> > table
> > build OK? Maybe you could use zless or zcat to extract and post the first
> > few
> > lines of it,
>
> The phrase table looks ok. Here are the first few lines:
>
> ( CEN ) ei ole pystynyt ||| ( CEN ) has not been ||| 1 0.000542073 1
> 0.010962 2.718 ||| ||| 1 1
> ( CEN ) ei ole ||| ( CEN ) has not ||| 1 0.0374029 1 0.010962 2.718 ||| |||
> 1 1
> ( CEN ) ja Yhdistyneiden Kansakuntien talouskomission ||| CEN and within
> the United Nations Economic ||| 1 0.00255803 1 0.0411325 2.718 ||| ||| 1 1
> ( CEN ) ja Yhdistyneiden Kansakuntien ||| CEN and within the United Nations
>
> ||| 1 0.00639507 1 0.0616988 2.718 ||| ||| 1 1
>
> The missed terms could be because the training data is very small - I was
> just trying to get started. I don't think missing terms could result in the
> process crashing, since there can always be missing terms even with a large
> training data.
>
> > best regards - Barry
>
> Thanks and regards,
>
> --Sudip.
>
> > On Friday 25 March 2011 08:13, Sudip Datta wrote:
> > > Hi,
> > >
> > > I am a noob at using Moses and have been trying to build a model and
> > > then use the decoder to translate test sentences. I used the following
> > > command for training:
> > >
> > > * train-model.perl --root-dir /cygdrive/d/moses/fi-en/**fienModel/
> >
> > --corpus
> >
> > > /cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm
> > > 0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1*
> > >
> > > The process ended cleanly with the following moses.ini file:
> > >
> > > *# input factors
> > > [input-factors]
> > > 0
> > >
> > > # mapping steps
> > > [mapping]
> > > 0 T 0
> > >
> > > # translation tables: table type (hierarchical(0), textual (0), binary
> > > (1)), source-factors, target-factors, number of scores, file
> > > # OLD FORMAT is still handled for back-compatibility
> > > # OLD FORMAT translation tables: source-factors, target-factors, number
> >
> > of
> >
> > > scores, file
> > > # OLD FORMAT a binary table type (1) is assumed
> > > [ttable-file]
> > > 0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
> > >
> > > # no generation models, no generation-file section
> > >
> > > # language models: type(srilm/irstlm), factors, order, file
> > > [lmodel-file]
> > > 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> > >
> > >
> > > # limit on how many phrase translations e for each phrase f are loaded
> > > # 0 = all elements loaded
> > > [ttable-limit]
> > > 20
> > >
> > > # distortion (reordering) weight
> > > [weight-d]
> > > 0.6
> > >
> > > # language model weights
> > > [weight-l]
> > > 0.5000
> > >
> > >
> > > # translation model weights
> > > [weight-t]
> > > 0.2
> > > 0.2
> > > 0.2
> > > 0.2
> > > 0.2
> > >
> > > # no generation models, no weight-generation section
> > >
> > > # word penalty
> > > [weight-w]
> > > -1
> > >
> > > [distortion-limit]
> > > 6*
> > >
> > > But the decoding step ends with a segfault with following output for -v
> >
> > 3:
> > > *Defined parameters (per moses.ini or switch):
> > >         config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini
> > >         distortion-limit: 6
> > >         input-factors: 0
> > >         lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz
> > >         mapping: 0 T 0
> > >         ttable-file: 0 0 0 5
> > > /cygdrive/d/moses/fi-en/fienModel//model/phrase-tab
> > > le.gz
> > >         ttable-limit: 20
> > >         verbose: 100
> > >         weight-d: 0.6
> > >         weight-l: 0.5000
> > >         weight-t: 0.2 0.2 0.2 0.2 0.2
> > >         weight-w: -1
> > > input type is: text input
> > > Loading lexical distortion models...have 0 models
> > > Start loading LanguageModel /cygdrive/d/moses/fi-en/en.irstlm.gz :
> >
> > [0.000]
> >
> > > secon
> > > ds
> > > In LanguageModelIRST::Load: nGramOrder = 2
> > > Loading LM file (no MAP)
> > > iARPA
> > > loadtxt()
> > > 1-grams: reading 3195 entries
> > > 2-grams: reading 13313 entries
> > > 3-grams: reading 20399 entries
> > > done
> > > OOV code is 3194
> > > OOV code is 3194
> > > IRST: m_unknownId=3194
> > > creating cache for storing prob, state and statesize of ngrams
> > > Finished loading LanguageModels : [1.000] seconds
> > > About to LoadPhraseTables
> > > Start loading PhraseTable
> > > /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.
> > > gz : [1.000] seconds
> > > filePath: /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz
> > > using standard phrase tables
> > > PhraseDictionaryMemory: input=FactorMask<0>  output=FactorMask<0>
> > > Finished loading phrase tables : [1.000] seconds
> > > IO from STDOUT/STDIN
> > > Created input-output object : [1.000] seconds
> > > The score component vector looks like this:
> > > Distortion
> > > WordPenalty
> > > !UnknownWordPenalty
> > > LM_2gram
> > > PhraseModel_1
> > > PhraseModel_2
> > > PhraseModel_3
> > > PhraseModel_4
> > > PhraseModel_5
> > > Stateless: 1    Stateful: 2
> > > The global weight vector looks like this: 0.600 -1.000 1.000 0.500
> > > 0.200 0.200
> > > 0
> > > .200 0.200 0.200
> > > Translating: istuntokauden uudelleenavaaminen
> > >
> > > DecodeStep():
> > >         outputFactors=FactorMask<0>
> > >         conflictFactors=FactorMask<>
> > >         newOutputFactors=FactorMask<0>
> > > Translation Option Collection
> > >
> > >        Total translation options: 2
> > > Total translation options pruned: 0
> > > translation options spanning from  0 to 0 is 1
> > > translation options spanning from  0 to 1 is 0
> > > translation options spanning from  1 to 1 is 1
> > > translation options generated in total: 2
> > > future cost from 0 to 0 is -100.136
> > > future cost from 0 to 1 is -200.271
> > > future cost from 1 to 1 is -100.136
> > > Collecting options took 0.000 seconds
> > > added hyp to stack, best on stack, now size 1
> > > processing hypothesis from next stack
> > >
> > > creating hypothesis 1 from 0 ( ... )
> > >         base score 0.000
> > >         covering 0-0: istuntokauden
> > >         translated as: istuntokauden|UNK|UNK|UNK
> > >         score -100.136 + future cost -100.136 = -200.271
> > >         unweighted feature scores: <<0.000, -1.000, -100.000, -2.271,
> > > 0.000, 0.0
> > > 00, 0.000, 0.000, 0.000>>
> > > added hyp to stack, best on stack, now size 1
> > > Segmentation fault (core dumped)*
> > >
> > > The only suspicious thing I found in above is the message '*creating
> > > hypothesis 1 from 0*', but neither I know if it is the actual problem
> > > and why is it happening. I believe that problem is with the training
> > > step
> >
> > since
> >
> > > the samples models that I downloaded from
> > > http://www.statmt.org/moses/download/sample-models.tgz work fine.
> > >
> > > Prior to this, I constructed an IRST LM an used clean-corpus-n.perl for
> > > cleaning the decoder input. Looking at the archives, the closest
> > > message
> >
> > I
> >
> > > could find was
> > > http://thread.gmane.org/gmane.comp.nlp.moses.user/1478but I don't think
> > > I'm committing the same mistake as the author of that
> >
> > message.
> >
> > > I'll be delighted if anybody could provide any insights in this problem
> >
> > or
> >
> > > requires me to provide any further details.
> > >
> > > Thanks,
> > >
> > > --Sudip.
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to