Re: [Moses-support] Factored instead of Phrase-based Model?
Dear Shaimaa, I don't understand which files are supposed to form the word-aligned sentence-parallel training corpus. I expected three files with the exact same number of lines, but the alignment file has only 4 lines while the corpora have 29 lines: 4 aligned.grow-diag-final 29 car-ready2016-2.de 29 car-ready2016-2.en 45 phrase-table.gz 37 verbose.docx I'm attaching my script to visually present alignments like this: this █ - - - - car - █ - - - was - - █ - - stolen - - - █ - . - - - - █ dieses . auto wurde gestohlen To get the output, use: paste car-ready2016-2.en car-ready2016-2.de aligned.grow-diag-final | alitextview.pl | less (I might have swapped the languages.) If your training corpus consists now of the files: car-ready2016-2.en car-ready2016-2.de aligned.grow-diag-final, then obviously, no "ich verkaufe" can be translated, it's never aligned to anything in the training data. Cheers, O. - Original Message - > From: "Shaimaa Marzouk" <marzou...@yahoo.de> > To: Moses-support@mit.edu, "Ondrej Bojar" <bo...@ufal.mff.cuni.cz> > Sent: Wednesday, 6 January, 2016 18:27:02 > Subject: Re: [Moses-support] Factored instead of Phrase-based Model? > Dear Ondrej & Moses-Team, > > @Ondrej: thanks a lot for your quick feedback. > > The phrase "ich habe" does not appear in the phrase table. The word alignment > file includes only the first 4 sentences of the training data. > > I have separated the sentence (ich habe das auto verkauf) in a separate "in" > file, but got the same result. I also tried another sentence (ich verkaufe das > auto), also here "ich verkaufe" can not be translated. I repeated the exact > sentence (ich verkaufe das auto) many times in the training data and still get > the same result. > I attach the word alignment, phrase table, training data and verbose result.. > and would be very grateful to receive any tip. > > I would also highly appreciate, if you could let me know, where can I find > information about > 1. how to prepare the training data with additional factors, before training > the Factored Model? > 2. how to train the Language Model that considers the POS? > > I think that sooner or later, the sentences will get complexer and I would > need > to work with a Factored Model. > > > Many Thanks > Shaimaa > > > > > > > > Ondrej Bojar <bo...@ufal.mff.cuni.cz> schrieb am Mi, 6.1.2016: > > Betreff: Re: [Moses-support] Factored instead of Phrase-based Model? > An: "Shaimaa Marzouk" <marzou...@yahoo.de>, "Shaimaa Marzouk" > <marzou...@yahoo.de>, Moses-support@mit.edu > Datum: Mittwoch, 6. Januar, 2016 08:42 Uhr > > Dear Shaimaa, > > Adding factors can only > increase any out-of-vocabulary issues. > > Use -v (perhaps even a higher verbosity level) > in moses to see what all translation options are considered > for the problematic sentence. There could be some > unfortunate weight settings that for some reason prefer > identity translation. (The identity translation must however > appear in the data, or the source word must not appear in > the data, otherwise Moses would not produce identity > translation at all.) > > And > then go back to the phrase table and manually search for the > lines that are supposed to cover the missing words. Here you > may find the identity entries. > > And then go back and check the word alignment > this (test) sentence got in the training data. There are > most likely some issues with the alignment that prevented > proper translations to be extracted. > > Best, Ondrej. > > > On January 6, 2016 4:48:26 AM CET, Shaimaa > Marzouk <marzou...@yahoo.de> > wrote: > >Dear Moses-Team, > > > >I am trying to > translate two short sentences included in the same file > >from German into English using a > “Phrase-based Model”. The first > >sentence (das auto wurde verkauft) is > translated correctly, while the > >second > is partly translated: > > > >I receive as a result for “ich habe das > auto verkauft” > >Ich|UNK|UNK|UNK > habe|UNK|UNK|UNK the car sold [1] > >[total=-203.330] core=(-200.000, > -5.000, 5.000, 0.000, 0.000, 0.000, > >0.000, 0.000, -18.660) > > > >I tried to modify the > training data in different ways, and at last > >included the exact sentence (along with its > translation) in the > >training data (see > attachment). But, I still get the same result. > > > >Do I need to use a > “Factored Transla
Re: [Moses-support] Factored instead of Phrase-based Model?
Dear Ondrej & Moses-Team, @Ondrej: thanks a lot for your quick feedback. The phrase "ich habe" does not appear in the phrase table. The word alignment file includes only the first 4 sentences of the training data. I have separated the sentence (ich habe das auto verkauf) in a separate "in" file, but got the same result. I also tried another sentence (ich verkaufe das auto), also here "ich verkaufe" can not be translated. I repeated the exact sentence (ich verkaufe das auto) many times in the training data and still get the same result. I attach the word alignment, phrase table, training data and verbose result.. and would be very grateful to receive any tip. I would also highly appreciate, if you could let me know, where can I find information about 1. how to prepare the training data with additional factors, before training the Factored Model? 2. how to train the Language Model that considers the POS? I think that sooner or later, the sentences will get complexer and I would need to work with a Factored Model. Many Thanks Shaimaa Ondrej Bojar <bo...@ufal.mff.cuni.cz> schrieb am Mi, 6.1.2016: Betreff: Re: [Moses-support] Factored instead of Phrase-based Model? An: "Shaimaa Marzouk" <marzou...@yahoo.de>, "Shaimaa Marzouk" <marzou...@yahoo.de>, Moses-support@mit.edu Datum: Mittwoch, 6. Januar, 2016 08:42 Uhr Dear Shaimaa, Adding factors can only increase any out-of-vocabulary issues. Use -v (perhaps even a higher verbosity level) in moses to see what all translation options are considered for the problematic sentence. There could be some unfortunate weight settings that for some reason prefer identity translation. (The identity translation must however appear in the data, or the source word must not appear in the data, otherwise Moses would not produce identity translation at all.) And then go back to the phrase table and manually search for the lines that are supposed to cover the missing words. Here you may find the identity entries. And then go back and check the word alignment this (test) sentence got in the training data. There are most likely some issues with the alignment that prevented proper translations to be extracted. Best, Ondrej. On January 6, 2016 4:48:26 AM CET, Shaimaa Marzouk <marzou...@yahoo.de> wrote: >Dear Moses-Team, > >I am trying to translate two short sentences included in the same file >from German into English using a “Phrase-based Model”. The first >sentence (das auto wurde verkauft) is translated correctly, while the >second is partly translated: > >I receive as a result for “ich habe das auto verkauft” >Ich|UNK|UNK|UNK habe|UNK|UNK|UNK the car sold [1] >[total=-203.330] core=(-200.000, -5.000, 5.000, 0.000, 0.000, 0.000, >0.000, 0.000, -18.660) > >I tried to modify the training data in different ways, and at last >included the exact sentence (along with its translation) in the >training data (see attachment). But, I still get the same result. > >Do I need to use a “Factored Translation Model” instead of the >“Phrase-based Model” to be able to translate this sentence? If yes, I >find here http://www.statmt.org/moses/?n=Moses.FactoredTutorial >explanation of how to train Factored Models. Could you please tell me, >where can I find information about >1. how to prepare the training data with additional factors, before >training the Factored Model? >2. how to train the Language Model that considers the POS? > >I currently use KenLM and Giza++. > >Thanks a lot for your support. > >Kind regards, >Shaimaa > > > >___ >Moses-support mailing list >Moses-support@mit.edu >http://mailman.mit.edu/mailman/listinfo/moses-support -- Ondrej Bojar (mailto:o...@cuni.cz / bo...@ufal.mff.cuni.cz) http://www.cuni.cz/~obo aligned.grow-diag-final Description: Binary data phrase-table.gz Description: GNU Zip compressed data verbose.docx Description: MS-Word 2007 document car-ready2016-2.de Description: Binary data car-ready2016-2.en Description: Binary data ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Factored instead of Phrase-based Model?
Dear Shaimaa, Adding factors can only increase any out-of-vocabulary issues. Use -v (perhaps even a higher verbosity level) in moses to see what all translation options are considered for the problematic sentence. There could be some unfortunate weight settings that for some reason prefer identity translation. (The identity translation must however appear in the data, or the source word must not appear in the data, otherwise Moses would not produce identity translation at all.) And then go back to the phrase table and manually search for the lines that are supposed to cover the missing words. Here you may find the identity entries. And then go back and check the word alignment this (test) sentence got in the training data. There are most likely some issues with the alignment that prevented proper translations to be extracted. Best, Ondrej. On January 6, 2016 4:48:26 AM CET, Shaimaa Marzoukwrote: >Dear Moses-Team, > >I am trying to translate two short sentences included in the same file >from German into English using a “Phrase-based Model”. The first >sentence (das auto wurde verkauft) is translated correctly, while the >second is partly translated: > >I receive as a result for “ich habe das auto verkauft” >Ich|UNK|UNK|UNK habe|UNK|UNK|UNK the car sold [1] >[total=-203.330] core=(-200.000, -5.000, 5.000, 0.000, 0.000, 0.000, >0.000, 0.000, -18.660) > >I tried to modify the training data in different ways, and at last >included the exact sentence (along with its translation) in the >training data (see attachment). But, I still get the same result. > >Do I need to use a “Factored Translation Model” instead of the >“Phrase-based Model” to be able to translate this sentence? If yes, I >find here http://www.statmt.org/moses/?n=Moses.FactoredTutorial >explanation of how to train Factored Models. Could you please tell me, >where can I find information about >1. how to prepare the training data with additional factors, before >training the Factored Model? >2. how to train the Language Model that considers the POS? > >I currently use KenLM and Giza++. > >Thanks a lot for your support. > >Kind regards, >Shaimaa > > > >___ >Moses-support mailing list >Moses-support@mit.edu >http://mailman.mit.edu/mailman/listinfo/moses-support -- Ondrej Bojar (mailto:o...@cuni.cz / bo...@ufal.mff.cuni.cz) http://www.cuni.cz/~obo ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support