Re: [Moses-support] tree-to-string problem
your training data should be in a format that Moses understand, eg. the cat Currently, if looks like the training data is whatever came out of the parser. The syntax tutorial has a bit more information http://www.statmt.org/moses/?n=Moses.SyntaxTutorial On 18/04/2016 14:07, Annette Rios wrote: > Hi all > > I'm trying to build a tree-to-string system, and I get this error from > moses_chart: > > Exception: moses/Phrase.cpp:214 in void > Moses::Phrase::CreateFromString(Moses::FactorDirection, const > std::vector&, const StringPiece&, Moses::Word**) > threw util::Exception because `nextPos == string::npos'. > Incorrect formatting of non-terminal. Should have 2 non-terms, eg. > [X][X]. Current string: [SP] > > The corresponding lines in the phrase table look like this: > > [S [AQ asumidos] [cag [sp [SP con] [sn [NP áfrica]]] [conj [CC y]] [SP] > [sn [NP áfrica ||| und [X][X] Afrika [X] ||| 0.0874939 0.69856 > 0.174988 0.36 0.606531 ||| 3-0 4-1 5-2 ||| 4 2 2 ||| ||| > [S [AQ asumidos] [cag [sp [SP con] [sn [NP áfrica]]] [conj [CC y]] [SP] > [sn [NP ||| und [X][X] [X][X] [X] ||| 0.00185172 0.838272 0.174988 > 0.865553 0.606531 ||| 3-0 4-1 5-2 ||| 189 2 2 ||| ||| > [S [AQ asumidos] [cag [sp [SP con] [sn [NP áfrica]]] [conj [CC y]] [SP] > [sn]]] ||| und [X][X] [X][X] [X] ||| 0.00185172 0.838272 0.174988 > 0.865553 0.606531 ||| 3-0 4-1 5-2 ||| 189 2 2 ||| ||| > > > extracted from this parse: > > 4asumidosasumidoaAQ >gen=m|num=p|postype=qualificative|eagles=AQ0MPP3S__ > 5conconsSPpostype=preposition|eagles=SPS00 8 >sp__ > 6áfricaáfricanNPpostype=proper||eagles=NP0 5 >sn__ > 7yycCCpostype=coordinating|eagles=CC8 conj >__ > 8porporsSPpostype=preposition|eagles=SPS00 4 >cag__ > 9áfricaáfricanNPpostype=proper||eagles=NP0 8 >sn__ > > converted to xml with conll2mosesxml.py: > > > asumidos > > > con > > áfrica > > > > y > > por > > áfrica > > > > > Is there something wrong in my parse trees that causes this? > > Best regards > > Annette > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Phrase Table error
the word [कलकत्ता] is has been confused with a non-terminal by Moses. You must escape your training and input data before giving it to Moses. You can escape your data using the script scripts/tokenizer/escape-special-chars.perl or witht the tokenizer script scripts/tokenizer/tokenizer.perl On 18/04/2016 17:01, Akhilesh Gupta wrote: Hello Sir, I was trying to run moses using already generated models. But I got this error: hieu@hieu-VirtualBox:~/workspace/github/working$ /home/hieu/workspace/github/mosesdecoder/bin/moses -f india/en-mr/moses.ini Defined parameters (per moses.ini or switch): config: india/en-mr/moses.ini distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6 /home/hieu/workspace/github/working/india/en-mr/reordering-table.wbe-msd-bidirectional-fe.gz distortion-limit: 6 input-factors: 0 lmodel-file: 1 0 5 /home/hieu/workspace/github/working/india/mr/mr.lm mapping: 0 T 0 ttable-file: 0 0 0 5 /home/hieu/workspace/github/working/india/en-mr/phrase-table.gz ttable-limit: 20 weight-d: 0.0655118 0.10091 0.0237089 0.0746748 0.0667524 0.0398009 0.0216711 weight-l: 0.15864 weight-t: 0.0294934 0.0740486 -6.53905e-05 0.00500778 0.281338 weight-w: 0.0583774 line=IRSTLM factor=0 order=5 num-features=1 path=/home/hieu/workspace/github/working/india/mr/mr.lm FeatureFunction: IRSTLM0 start: 0 end: 0 line=Distortion FeatureFunction: Distortion0 start: 1 end: 1 line=LexicalReordering type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 num-features=6 path=/home/hieu/workspace/github/working/india/en-mr/reordering-table.wbe-msd-bidirectional-fe.gz FeatureFunction: LexicalReordering0 start: 2 end: 7 Initializing LexicalReordering.. line=WordPenalty FeatureFunction: WordPenalty0 start: 8 end: 8 line=UnknownWordPenalty FeatureFunction: UnknownWordPenalty0 start: 9 end: 9 line=PhraseDictionaryMemory input-factor=0 output-factor=0 path=/home/hieu/workspace/github/working/india/en-mr/phrase-table.gz num-features=5 table-limit=20 FeatureFunction: PhraseDictionaryMemory0 start: 10 end: 14 Loading IRSTLM0 In LanguageModelIRST::Load: nGramOrder = 5 Language Model Type of /home/hieu/workspace/github/working/india/mr/mr.lm is 1 Language Model Type is 1 \data\ loadtxt_ram() 1-grams: reading 89171 entries done level 1 2-grams: reading 397900 entries done level 2 3-grams: reading 28396 entries done level 3 4-grams: reading 15557 entries done level 4 5-grams: reading 8777 entries done level 5 done starting to use OOV words [] OOV code is 89171 OOV code is 89171 OOV code is 89171 IRST: m_unknownId=89171 Loading Distortion0 Loading LexicalReordering0 Loading table into memory...done. Loading WordPenalty0 Loading UnknownWordPenalty0 Loading PhraseDictionaryMemory0 Start loading text phrase table. Moses format : [35.255] seconds Reading /home/hieu/workspace/github/working/india/en-mr/phrase-table.gz 5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 Exception: moses/Phrase.cpp:214 in void Moses::Phrase::CreateFromString(Moses::FactorDirection, const std::vector&, const StringPiece&, Moses::Word**) threw util::Exception because `nextPos == string::npos'. Incorrect formatting of non-terminal. Should have 2 non-terms, eg. [X][X]. Current string: [कलकत्ता] Please Help. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Data collection
Hi, the common training pipeline limits sentences to at most 80 words. This is due to limitations in GIZA++. There can be any mix of sentence lengths - long sentences, short sentences, single words. There is a good chance for the system to translate "I eat an apple" correctly, if it a training sentence pair with "I eat an apple on Friday and an orange on Saturday." -phi On Tue, Apr 19, 2016 at 6:15 AM, Sanjanashree Palanivel < sanjanash...@gmail.com> wrote: > Hi, > >How the data should be collected for training Moses. > >I wish to know how much longer and shorter the sentence can be for > training moses. > > What will happens, if the simple sentences like "I eat an apple" are given > for training with longer sentences. > > and what if i give a word as a sentence in data. > > > > -- > Thanks and regards, > > Sanjanasri J.P > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Data collection
Hi, How the data should be collected for training Moses. I wish to know how much longer and shorter the sentence can be for training moses. What will happens, if the simple sentences like "I eat an apple" are given for training with longer sentences. and what if i give a word as a sentence in data. -- Thanks and regards, Sanjanasri J.P ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] KenLM scoring of long target phrases
Hi, Any words beyond N-1 have full context and are included in the phrase's score. So it's hypothesis + target phrase + adjustments. And the routine you cite is computing adjustments. Kenneth On 04/19/16 10:50, Evgeny Matusov wrote: > > Hi, > > > my colleagues and I noticed the following in the KenLM code when a > Hypo is evaluated with the LM: > > > https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/Ken.cpp#L203 > > > Do we understand it correctly that because of this line, for phrases > longer than the LM order N only the first N words are scored with the > LM, the subsequent words are not scored? At least I don't see a call > to add their scores anywhere, they are just passed on to update the LM > state in lines 222-225. > > > Please clarify. It seems like a phrase should be scored by the LM > completely, otherwise longer phrases which start with frequent > n-grams but have unlikely word sequences afterwards are wrongly > preferred. Also, longer phrases are preferred in general with such > scoring. > > > Thanks, > > > Evgeny. > > > > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] KenLM scoring of long target phrases
Hi, my colleagues and I noticed the following in the KenLM code when a Hypo is evaluated with the LM: https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/Ken.cpp#L203 Do we understand it correctly that because of this line, for phrases longer than the LM order N only the first N words are scored with the LM, the subsequent words are not scored? At least I don't see a call to add their scores anywhere, they are just passed on to update the LM state in lines 222-225. Please clarify. It seems like a phrase should be scored by the LM completely, otherwise longer phrases which start with frequent n-grams but have unlikely word sequences afterwards are wrongly preferred. Also, longer phrases are preferred in general with such scoring. Thanks, Evgeny. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support