Hi Barry, Thanks a lot for the response.
On Fri, Mar 25, 2011 at 3:31 PM, Barry Haddow <bhad...@inf.ed.ac.uk> wrote: > Hi Sudip > > If you're using windows, then you should use the internal LM. See here: > http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9 > afaik this is still the case. > I am using Windows only because I've been forced to :(. Shouldn't using it on Cygwin work the same way as any other Linux distro? > > Also, there are a couple of odd things in your setup. Firstly, you've built > a > 3-gram LM, but you're telling moses that it's 2-gram: > > [lmodel-file] > > 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz > This shouldn't matter, but just in case you're unaware. > > All the while I was confused what order represents in --lm factor:order:filename:type. Thanks for pointing it out that it is the n-gram. Also, both the words in your input sentence are unknown. Did the phrase > table > build OK? Maybe you could use zless or zcat to extract and post the first > few > lines of it, > > The phrase table looks ok. Here are the first few lines: ( CEN ) ei ole pystynyt ||| ( CEN ) has not been ||| 1 0.000542073 1 0.010962 2.718 ||| ||| 1 1 ( CEN ) ei ole ||| ( CEN ) has not ||| 1 0.0374029 1 0.010962 2.718 ||| ||| 1 1 ( CEN ) ja Yhdistyneiden Kansakuntien talouskomission ||| CEN and within the United Nations Economic ||| 1 0.00255803 1 0.0411325 2.718 ||| ||| 1 1 ( CEN ) ja Yhdistyneiden Kansakuntien ||| CEN and within the United Nations ||| 1 0.00639507 1 0.0616988 2.718 ||| ||| 1 1 The missed terms could be because the training data is very small - I was just trying to get started. I don't think missing terms could result in the process crashing, since there can always be missing terms even with a large training data. > best regards - Barry > Thanks and regards, --Sudip. > > On Friday 25 March 2011 08:13, Sudip Datta wrote: > > Hi, > > > > I am a noob at using Moses and have been trying to build a model and then > > use the decoder to translate test sentences. I used the following command > > for training: > > > > * train-model.perl --root-dir /cygdrive/d/moses/fi-en/**fienModel/ > --corpus > > /cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm > > 0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1* > > > > The process ended cleanly with the following moses.ini file: > > > > *# input factors > > [input-factors] > > 0 > > > > # mapping steps > > [mapping] > > 0 T 0 > > > > # translation tables: table type (hierarchical(0), textual (0), binary > > (1)), source-factors, target-factors, number of scores, file > > # OLD FORMAT is still handled for back-compatibility > > # OLD FORMAT translation tables: source-factors, target-factors, number > of > > scores, file > > # OLD FORMAT a binary table type (1) is assumed > > [ttable-file] > > 0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz > > > > # no generation models, no generation-file section > > > > # language models: type(srilm/irstlm), factors, order, file > > [lmodel-file] > > 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz > > > > > > # limit on how many phrase translations e for each phrase f are loaded > > # 0 = all elements loaded > > [ttable-limit] > > 20 > > > > # distortion (reordering) weight > > [weight-d] > > 0.6 > > > > # language model weights > > [weight-l] > > 0.5000 > > > > > > # translation model weights > > [weight-t] > > 0.2 > > 0.2 > > 0.2 > > 0.2 > > 0.2 > > > > # no generation models, no weight-generation section > > > > # word penalty > > [weight-w] > > -1 > > > > [distortion-limit] > > 6* > > > > But the decoding step ends with a segfault with following output for -v > 3: > > > > *Defined parameters (per moses.ini or switch): > > config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini > > distortion-limit: 6 > > input-factors: 0 > > lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz > > mapping: 0 T 0 > > ttable-file: 0 0 0 5 > > /cygdrive/d/moses/fi-en/fienModel//model/phrase-tab > > le.gz > > ttable-limit: 20 > > verbose: 100 > > weight-d: 0.6 > > weight-l: 0.5000 > > weight-t: 0.2 0.2 0.2 0.2 0.2 > > weight-w: -1 > > input type is: text input > > Loading lexical distortion models...have 0 models > > Start loading LanguageModel /cygdrive/d/moses/fi-en/en.irstlm.gz : > [0.000] > > secon > > ds > > In LanguageModelIRST::Load: nGramOrder = 2 > > Loading LM file (no MAP) > > iARPA > > loadtxt() > > 1-grams: reading 3195 entries > > 2-grams: reading 13313 entries > > 3-grams: reading 20399 entries > > done > > OOV code is 3194 > > OOV code is 3194 > > IRST: m_unknownId=3194 > > creating cache for storing prob, state and statesize of ngrams > > Finished loading LanguageModels : [1.000] seconds > > About to LoadPhraseTables > > Start loading PhraseTable > > /cygdrive/d/moses/fi-en/fienModel//model/phrase-table. > > gz : [1.000] seconds > > filePath: /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz > > using standard phrase tables > > PhraseDictionaryMemory: input=FactorMask<0> output=FactorMask<0> > > Finished loading phrase tables : [1.000] seconds > > IO from STDOUT/STDIN > > Created input-output object : [1.000] seconds > > The score component vector looks like this: > > Distortion > > WordPenalty > > !UnknownWordPenalty > > LM_2gram > > PhraseModel_1 > > PhraseModel_2 > > PhraseModel_3 > > PhraseModel_4 > > PhraseModel_5 > > Stateless: 1 Stateful: 2 > > The global weight vector looks like this: 0.600 -1.000 1.000 0.500 0.200 > > 0.200 > > 0 > > .200 0.200 0.200 > > Translating: istuntokauden uudelleenavaaminen > > > > DecodeStep(): > > outputFactors=FactorMask<0> > > conflictFactors=FactorMask<> > > newOutputFactors=FactorMask<0> > > Translation Option Collection > > > > Total translation options: 2 > > Total translation options pruned: 0 > > translation options spanning from 0 to 0 is 1 > > translation options spanning from 0 to 1 is 0 > > translation options spanning from 1 to 1 is 1 > > translation options generated in total: 2 > > future cost from 0 to 0 is -100.136 > > future cost from 0 to 1 is -200.271 > > future cost from 1 to 1 is -100.136 > > Collecting options took 0.000 seconds > > added hyp to stack, best on stack, now size 1 > > processing hypothesis from next stack > > > > creating hypothesis 1 from 0 ( ... ) > > base score 0.000 > > covering 0-0: istuntokauden > > translated as: istuntokauden|UNK|UNK|UNK > > score -100.136 + future cost -100.136 = -200.271 > > unweighted feature scores: <<0.000, -1.000, -100.000, -2.271, > > 0.000, 0.0 > > 00, 0.000, 0.000, 0.000>> > > added hyp to stack, best on stack, now size 1 > > Segmentation fault (core dumped)* > > > > The only suspicious thing I found in above is the message '*creating > > hypothesis 1 from 0*', but neither I know if it is the actual problem and > > why is it happening. I believe that problem is with the training step > since > > the samples models that I downloaded from > > http://www.statmt.org/moses/download/sample-models.tgz work fine. > > > > Prior to this, I constructed an IRST LM an used clean-corpus-n.perl for > > cleaning the decoder input. Looking at the archives, the closest message > I > > could find was http://thread.gmane.org/gmane.comp.nlp.moses.user/1478but I > > don't think I'm committing the same mistake as the author of that > message. > > > > I'll be delighted if anybody could provide any insights in this problem > or > > requires me to provide any further details. > > > > Thanks, > > > > --Sudip. > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support