Hi Sudip You're phrase table looks fine, assuming that all the 'CEN' tokens are supposed to be there. Unknown words don't cause moses to crash, but I thought they might be symptomatic of some other problem with your phrase table.
Unfortunately, I think you'll either have to switch to linux (can you use a live distro, if you can't install one?) or use the internal LM, best regards - Barry On Friday 25 March 2011 10:15, Sudip Datta wrote: > Hi Barry, > > Thanks a lot for the response. > > On Fri, Mar 25, 2011 at 3:31 PM, Barry Haddow <bhad...@inf.ed.ac.uk> wrote: > > Hi Sudip > > > > If you're using windows, then you should use the internal LM. See here: > > http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9 > > afaik this is still the case. > > I am using Windows only because I've been forced to :(. Shouldn't using it > on Cygwin work the same way as any other Linux distro? > > > Also, there are a couple of odd things in your setup. Firstly, you've > > built a > > > > 3-gram LM, but you're telling moses that it's 2-gram: > > > [lmodel-file] > > > 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz > > > > This shouldn't matter, but just in case you're unaware. > > All the while I was confused what order represents in --lm > factor:order:filename:type. Thanks for pointing it out that it is the > n-gram. > > Also, both the words in your input sentence are unknown. Did the phrase > > > table > > build OK? Maybe you could use zless or zcat to extract and post the first > > few > > lines of it, > > The phrase table looks ok. Here are the first few lines: > > ( CEN ) ei ole pystynyt ||| ( CEN ) has not been ||| 1 0.000542073 1 > 0.010962 2.718 ||| ||| 1 1 > ( CEN ) ei ole ||| ( CEN ) has not ||| 1 0.0374029 1 0.010962 2.718 ||| ||| > 1 1 > ( CEN ) ja Yhdistyneiden Kansakuntien talouskomission ||| CEN and within > the United Nations Economic ||| 1 0.00255803 1 0.0411325 2.718 ||| ||| 1 1 > ( CEN ) ja Yhdistyneiden Kansakuntien ||| CEN and within the United Nations > > ||| 1 0.00639507 1 0.0616988 2.718 ||| ||| 1 1 > > The missed terms could be because the training data is very small - I was > just trying to get started. I don't think missing terms could result in the > process crashing, since there can always be missing terms even with a large > training data. > > > best regards - Barry > > Thanks and regards, > > --Sudip. > > > On Friday 25 March 2011 08:13, Sudip Datta wrote: > > > Hi, > > > > > > I am a noob at using Moses and have been trying to build a model and > > > then use the decoder to translate test sentences. I used the following > > > command for training: > > > > > > * train-model.perl --root-dir /cygdrive/d/moses/fi-en/**fienModel/ > > > > --corpus > > > > > /cygdrive/d/moses/fi-en/temp/**clean --f fi --e en --lm > > > 0:3:/cygdrive/d/moses/fi-en/**en.irstlm.gz:1* > > > > > > The process ended cleanly with the following moses.ini file: > > > > > > *# input factors > > > [input-factors] > > > 0 > > > > > > # mapping steps > > > [mapping] > > > 0 T 0 > > > > > > # translation tables: table type (hierarchical(0), textual (0), binary > > > (1)), source-factors, target-factors, number of scores, file > > > # OLD FORMAT is still handled for back-compatibility > > > # OLD FORMAT translation tables: source-factors, target-factors, number > > > > of > > > > > scores, file > > > # OLD FORMAT a binary table type (1) is assumed > > > [ttable-file] > > > 0 0 0 5 /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz > > > > > > # no generation models, no generation-file section > > > > > > # language models: type(srilm/irstlm), factors, order, file > > > [lmodel-file] > > > 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz > > > > > > > > > # limit on how many phrase translations e for each phrase f are loaded > > > # 0 = all elements loaded > > > [ttable-limit] > > > 20 > > > > > > # distortion (reordering) weight > > > [weight-d] > > > 0.6 > > > > > > # language model weights > > > [weight-l] > > > 0.5000 > > > > > > > > > # translation model weights > > > [weight-t] > > > 0.2 > > > 0.2 > > > 0.2 > > > 0.2 > > > 0.2 > > > > > > # no generation models, no weight-generation section > > > > > > # word penalty > > > [weight-w] > > > -1 > > > > > > [distortion-limit] > > > 6* > > > > > > But the decoding step ends with a segfault with following output for -v > > > > 3: > > > *Defined parameters (per moses.ini or switch): > > > config: /cygdrive/d/moses/fi-en/fienModel/model/moses.ini > > > distortion-limit: 6 > > > input-factors: 0 > > > lmodel-file: 1 0 2 /cygdrive/d/moses/fi-en/en.irstlm.gz > > > mapping: 0 T 0 > > > ttable-file: 0 0 0 5 > > > /cygdrive/d/moses/fi-en/fienModel//model/phrase-tab > > > le.gz > > > ttable-limit: 20 > > > verbose: 100 > > > weight-d: 0.6 > > > weight-l: 0.5000 > > > weight-t: 0.2 0.2 0.2 0.2 0.2 > > > weight-w: -1 > > > input type is: text input > > > Loading lexical distortion models...have 0 models > > > Start loading LanguageModel /cygdrive/d/moses/fi-en/en.irstlm.gz : > > > > [0.000] > > > > > secon > > > ds > > > In LanguageModelIRST::Load: nGramOrder = 2 > > > Loading LM file (no MAP) > > > iARPA > > > loadtxt() > > > 1-grams: reading 3195 entries > > > 2-grams: reading 13313 entries > > > 3-grams: reading 20399 entries > > > done > > > OOV code is 3194 > > > OOV code is 3194 > > > IRST: m_unknownId=3194 > > > creating cache for storing prob, state and statesize of ngrams > > > Finished loading LanguageModels : [1.000] seconds > > > About to LoadPhraseTables > > > Start loading PhraseTable > > > /cygdrive/d/moses/fi-en/fienModel//model/phrase-table. > > > gz : [1.000] seconds > > > filePath: /cygdrive/d/moses/fi-en/fienModel//model/phrase-table.gz > > > using standard phrase tables > > > PhraseDictionaryMemory: input=FactorMask<0> output=FactorMask<0> > > > Finished loading phrase tables : [1.000] seconds > > > IO from STDOUT/STDIN > > > Created input-output object : [1.000] seconds > > > The score component vector looks like this: > > > Distortion > > > WordPenalty > > > !UnknownWordPenalty > > > LM_2gram > > > PhraseModel_1 > > > PhraseModel_2 > > > PhraseModel_3 > > > PhraseModel_4 > > > PhraseModel_5 > > > Stateless: 1 Stateful: 2 > > > The global weight vector looks like this: 0.600 -1.000 1.000 0.500 > > > 0.200 0.200 > > > 0 > > > .200 0.200 0.200 > > > Translating: istuntokauden uudelleenavaaminen > > > > > > DecodeStep(): > > > outputFactors=FactorMask<0> > > > conflictFactors=FactorMask<> > > > newOutputFactors=FactorMask<0> > > > Translation Option Collection > > > > > > Total translation options: 2 > > > Total translation options pruned: 0 > > > translation options spanning from 0 to 0 is 1 > > > translation options spanning from 0 to 1 is 0 > > > translation options spanning from 1 to 1 is 1 > > > translation options generated in total: 2 > > > future cost from 0 to 0 is -100.136 > > > future cost from 0 to 1 is -200.271 > > > future cost from 1 to 1 is -100.136 > > > Collecting options took 0.000 seconds > > > added hyp to stack, best on stack, now size 1 > > > processing hypothesis from next stack > > > > > > creating hypothesis 1 from 0 ( ... ) > > > base score 0.000 > > > covering 0-0: istuntokauden > > > translated as: istuntokauden|UNK|UNK|UNK > > > score -100.136 + future cost -100.136 = -200.271 > > > unweighted feature scores: <<0.000, -1.000, -100.000, -2.271, > > > 0.000, 0.0 > > > 00, 0.000, 0.000, 0.000>> > > > added hyp to stack, best on stack, now size 1 > > > Segmentation fault (core dumped)* > > > > > > The only suspicious thing I found in above is the message '*creating > > > hypothesis 1 from 0*', but neither I know if it is the actual problem > > > and why is it happening. I believe that problem is with the training > > > step > > > > since > > > > > the samples models that I downloaded from > > > http://www.statmt.org/moses/download/sample-models.tgz work fine. > > > > > > Prior to this, I constructed an IRST LM an used clean-corpus-n.perl for > > > cleaning the decoder input. Looking at the archives, the closest > > > message > > > > I > > > > > could find was > > > http://thread.gmane.org/gmane.comp.nlp.moses.user/1478but I don't think > > > I'm committing the same mistake as the author of that > > > > message. > > > > > I'll be delighted if anybody could provide any insights in this problem > > > > or > > > > > requires me to provide any further details. > > > > > > Thanks, > > > > > > --Sudip. > > > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support