Re: [Moses-support] tree-to-string problem

2016-04-19 Thread Hieu Hoang
your training data should be in a format that Moses understand, eg.
   the   
cat  
Currently, if looks like the training data is whatever came out of the 
parser.

The syntax tutorial has a bit more information
http://www.statmt.org/moses/?n=Moses.SyntaxTutorial

On 18/04/2016 14:07, Annette Rios wrote:
> Hi all
>
> I'm trying to build a tree-to-string system, and I get this error from
> moses_chart:
>
> Exception: moses/Phrase.cpp:214 in void
> Moses::Phrase::CreateFromString(Moses::FactorDirection, const
> std::vector&, const StringPiece&, Moses::Word**)
> threw util::Exception because `nextPos == string::npos'.
> Incorrect formatting of non-terminal. Should have 2 non-terms, eg.
> [X][X]. Current string: [SP]
>
> The corresponding lines in the phrase table look like this:
>
> [S [AQ asumidos] [cag [sp [SP con] [sn [NP áfrica]]] [conj [CC y]] [SP]
> [sn [NP áfrica ||| und [X][X] Afrika [X] ||| 0.0874939 0.69856
> 0.174988 0.36 0.606531 ||| 3-0 4-1 5-2 ||| 4 2 2 ||| |||
> [S [AQ asumidos] [cag [sp [SP con] [sn [NP áfrica]]] [conj [CC y]] [SP]
> [sn [NP ||| und [X][X] [X][X] [X] ||| 0.00185172 0.838272 0.174988
> 0.865553 0.606531 ||| 3-0 4-1 5-2 ||| 189 2 2 ||| |||
> [S [AQ asumidos] [cag [sp [SP con] [sn [NP áfrica]]] [conj [CC y]] [SP]
> [sn]]] ||| und [X][X] [X][X] [X] ||| 0.00185172 0.838272 0.174988
> 0.865553 0.606531 ||| 3-0 4-1 5-2 ||| 189 2 2 ||| |||
>
>
> extracted from this parse:
>
> 4asumidosasumidoaAQ
>gen=m|num=p|postype=qualificative|eagles=AQ0MPP3S__
> 5conconsSPpostype=preposition|eagles=SPS00  8
>sp__
> 6áfricaáfricanNPpostype=proper||eagles=NP0  5
>sn__
> 7yycCCpostype=coordinating|eagles=CC8  conj
>__
> 8porporsSPpostype=preposition|eagles=SPS00  4
>cag__
> 9áfricaáfricanNPpostype=proper||eagles=NP0  8
>sn__
>
> converted to xml with conll2mosesxml.py:
>
> 
>   asumidos
>   
> 
>   con
>   
> áfrica
>   
> 
> 
>   y
> 
> por
> 
>   áfrica
> 
>   
>
>
> Is there something wrong in my parse trees that causes this?
>
> Best regards
>
> Annette
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Phrase Table error

2016-04-19 Thread Hieu Hoang

the word [कलकत्ता] is has been confused with a non-terminal by Moses.

You must escape your training and input data before giving it to Moses. 
You can escape your data using the script

   scripts/tokenizer/escape-special-chars.perl
or witht the tokenizer script
   scripts/tokenizer/tokenizer.perl

On 18/04/2016 17:01, Akhilesh Gupta wrote:

Hello Sir,

I was trying to run moses using already generated models. But I got 
this error:


hieu@hieu-VirtualBox:~/workspace/github/working$ 
/home/hieu/workspace/github/mosesdecoder/bin/moses -f 
india/en-mr/moses.ini

Defined parameters (per moses.ini or switch):
config: india/en-mr/moses.ini
distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6 
/home/hieu/workspace/github/working/india/en-mr/reordering-table.wbe-msd-bidirectional-fe.gz 


distortion-limit: 6
input-factors: 0
lmodel-file: 1 0 5 /home/hieu/workspace/github/working/india/mr/mr.lm
mapping: 0 T 0
ttable-file: 0 0 0 5 
/home/hieu/workspace/github/working/india/en-mr/phrase-table.gz

ttable-limit: 20
weight-d: 0.0655118 0.10091 0.0237089 0.0746748 0.0667524 
0.0398009 0.0216711

weight-l: 0.15864
weight-t: 0.0294934 0.0740486 -6.53905e-05 0.00500778 0.281338
weight-w: 0.0583774
line=IRSTLM factor=0 order=5 num-features=1 
path=/home/hieu/workspace/github/working/india/mr/mr.lm

FeatureFunction: IRSTLM0 start: 0 end: 0
line=Distortion
FeatureFunction: Distortion0 start: 1 end: 1
line=LexicalReordering type=wbe-msd-bidirectional-fe-allff 
input-factor=0 output-factor=0 num-features=6 
path=/home/hieu/workspace/github/working/india/en-mr/reordering-table.wbe-msd-bidirectional-fe.gz

FeatureFunction: LexicalReordering0 start: 2 end: 7
Initializing LexicalReordering..
line=WordPenalty
FeatureFunction: WordPenalty0 start: 8 end: 8
line=UnknownWordPenalty
FeatureFunction: UnknownWordPenalty0 start: 9 end: 9
line=PhraseDictionaryMemory input-factor=0 output-factor=0 
path=/home/hieu/workspace/github/working/india/en-mr/phrase-table.gz 
num-features=5 table-limit=20

FeatureFunction: PhraseDictionaryMemory0 start: 10 end: 14
Loading IRSTLM0
In LanguageModelIRST::Load: nGramOrder = 5
Language Model Type of 
/home/hieu/workspace/github/working/india/mr/mr.lm is 1

Language Model Type is 1
\data\
loadtxt_ram()
1-grams: reading 89171 entries
done level 1
2-grams: reading 397900 entries
done level 2
3-grams: reading 28396 entries
done level 3
4-grams: reading 15557 entries
done level 4
5-grams: reading 8777 entries
done level 5
done
starting to use OOV words []
OOV code is 89171
OOV code is 89171
OOV code is 89171
IRST: m_unknownId=89171
Loading Distortion0
Loading LexicalReordering0
Loading table into memory...done.
Loading WordPenalty0
Loading UnknownWordPenalty0
Loading PhraseDictionaryMemory0
Start loading text phrase table. Moses format : [35.255] seconds
Reading /home/hieu/workspace/github/working/india/en-mr/phrase-table.gz
5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Exception: moses/Phrase.cpp:214 in void 
Moses::Phrase::CreateFromString(Moses::FactorDirection, const 
std::vector&, const StringPiece&, Moses::Word**) threw 
util::Exception because `nextPos == string::npos'.
Incorrect formatting of non-terminal. Should have 2 non-terms, eg. 
[X][X]. Current string: [कलकत्ता]


Please Help.



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Data collection

2016-04-19 Thread Philipp Koehn
Hi,

the common training pipeline limits sentences to at most 80 words.
This is due to limitations in GIZA++.

There can be any mix of sentence lengths - long sentences, short
sentences, single words.

There is a good chance for the system to translate "I eat an apple"
correctly, if it a training sentence pair with "I eat an apple on Friday
and
an orange on Saturday."

-phi

On Tue, Apr 19, 2016 at 6:15 AM, Sanjanashree Palanivel <
sanjanash...@gmail.com> wrote:

> Hi,
>
>How the data should be collected for training Moses.
>
>I wish to know how much longer and shorter the sentence can be for
> training moses.
>
> What will happens, if the simple sentences like "I eat an apple" are given
> for training with longer sentences.
>
> and what if i give a word as a sentence in data.
>
>
>
> --
> Thanks and regards,
>
> Sanjanasri J.P
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Data collection

2016-04-19 Thread Sanjanashree Palanivel
Hi,

   How the data should be collected for training Moses.

   I wish to know how much longer and shorter the sentence can be for
training moses.

What will happens, if the simple sentences like "I eat an apple" are given
for training with longer sentences.

and what if i give a word as a sentence in data.



-- 
Thanks and regards,

Sanjanasri J.P
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM scoring of long target phrases

2016-04-19 Thread Kenneth Heafield
Hi,

Any words beyond N-1 have full context and are included in the
phrase's score.  So it's hypothesis + target phrase + adjustments.  And
the routine you cite is computing adjustments. 

Kenneth

On 04/19/16 10:50, Evgeny Matusov wrote:
>
> Hi,
>
>
> my colleagues and I noticed the following in the KenLM code when a
> Hypo is evaluated with the LM:
>
>
> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/Ken.cpp#L203
>
>
> Do we understand it correctly that because of this line, for phrases
> longer than the LM order N only the first N words are scored with the
> LM, the subsequent words are not scored?  At least I don't see a call
> to add their scores anywhere, they are just passed on to update the LM
> state in lines 222-225.
>
>
> Please clarify. It seems like a phrase should be scored by the LM
> completely, otherwise longer phrases which start with frequent
> n-grams but have unlikely word sequences afterwards are wrongly
> preferred. Also, longer phrases are preferred in general with such
> scoring.
>
>
> Thanks,
>
>
> Evgeny.
>
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] KenLM scoring of long target phrases

2016-04-19 Thread Evgeny Matusov
Hi,


my colleagues and I noticed the following in the KenLM code when a Hypo is 
evaluated with the LM:


https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/Ken.cpp#L203


Do we understand it correctly that because of this line, for phrases longer 
than the LM order N only the first N words are scored with the LM, the 
subsequent words are not scored?  At least I don't see a call to add their 
scores anywhere, they are just passed on to update the LM state in lines 
222-225.


Please clarify. It seems like a phrase should be scored by the LM completely, 
otherwise longer phrases which start with frequent n-grams but have unlikely 
word sequences afterwards are wrongly preferred. Also, longer phrases are 
preferred in general with such scoring.


Thanks,


Evgeny.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support