[Moses-support] wrong alignment

2010-09-24 Thread musa ghurab

Hi

 

I trained a system of Chinese-Arabic language, but many alignments are wrong.
The same thing to lexical model, where are many words are wrongly aligned
Here is an example of lexical model (lex.e2f):


Note: right translation has (**) marks


今天 وهنا 0.0009911
今天 ما 0.110
今天 الآن 0.0003424
今天 وقال 0.0001732
今天 الخط 0.0056625
今天 يزالون 0.0046512
今天 تم 0.496
今天 هذا 0.0001187
今天 سابق 0.0004292
今天 يأت 0.0094340
今天 الخاسر 0.0188679
今天 المحلولة 0.200
今天 السبت 0.0096154
今天 أكون 0.0016247
今天 نعلم 0.0003154
今天 ان 0.560
今天 ننطلق 0.020
今天 الظهر 0.0029762
今天 الصباح 0.4434348
今天 مثلما 0.0022779
今天 نفعله 0.0013316
今天 لدينا 0.264
今天 ادلى 0.017
今天 يوم 0.0029304**
今天 عنها 0.0006026
今天 عالم 0.0075829
今天 برودي 0.0007008
今天 انها 0.819

 

Any suggestion to solve this problem, please?
  ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] wrong alignment

2010-09-24 Thread musa ghurab

Raphael, 
Thank you for your suggestion.


That is will take more time to do. Is there any other suggestion?

 

 

Best regards



 
> Subject: Re: [Moses-support] wrong alignment
> From: rpa...@alphacrc.com
> To: mossaghu...@hotmail.com
> Date: Fri, 24 Sep 2010 18:03:53 +0100
> 
> Hi
> 
> I don't speak Chinese or Arabic, but if I understand your problem
> correctly, it is not really solvable or even supposed to be solved with
> current statistical machine translation systems. Or, if you prefer, the
> current solution is: give more input data, maybe the alignments will be
> better.
> 
> A statistical system is not supposed to have only one exact translation
> for each word, or for each phrase. Phrase tables always contain a lot of
> noise, but in the whole, given that bad translations should have bad
> scores, and that the language model should play its role, the
> translation should be ok. In your case, the fact that the correct
> translation doesn't even have a good score is certainly worrying.
> 
> If you can find correct word alignments somewhere, you can add them to
> giza's input - there is currently some work on an option to add a
> dictionary as input, for now you can just add them as "monoword
> sentences" to the input corpus. But anyway, giza's output is not
> expected to contain only perfect alignments.
> 
> Best regards,
> 
> -- 
> Raphael
> 
> 
> On Sat, 2010-09-25 at 00:33 +0800, musa ghurab wrote:
> > Hi
> > 
> > I trained a system of Chinese-Arabic language, but many alignments are
> > wrong.
> > The same thing to lexical model, where are many words are wrongly
> > aligned
> > Here is an example of lexical model (lex.e2f):
> > 
> > Note: right translation has (**) marks
> > 
> > 今天 وهنا 0.0009911
> > 今天 ما 0.110
> > 今天 الآن 0.0003424
> > 今天 وقال 0.0001732
> > 今天 الخط 0.0056625
> > 今天 يزالون 0.0046512
> > 今天 تم 0.496
> > 今天 هذا 0.0001187
> > 今天 سابق 0.0004292
> > 今天 يأت 0.0094340
> > 今天 الخاسر 0.0188679
> > 今天 المحلولة 0.200
> > 今天 السبت 0.0096154
> > 今天 أكون 0.0016247
> > 今天 نعلم 0.0003154
> > 今天 ان 0.560
> > 今天 ننطلق 0.020
> > 今天 الظهر 0.0029762
> > 今天 الصباح 0.4434348
> > 今天 مثلما 0.0022779
> > 今天 نفعله 0.0013316
> > 今天 لدينا 0.264
> > 今天 ادلى 0.017
> > 今天 يوم 0.0029304 **
> > 今天 عنها 0.0006026
> > 今天 عالم 0.0075829
> > 今天 برودي 0.0007008
> > 今天 انها 0.819
> > 
> > Any suggestion to solve this problem, please? 
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
  ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] wrong alignment

2010-09-24 Thread John Burger
musa ghurab wrote:

> I trained a system of Chinese-Arabic language, but many alignments  
> are wrong.
> The same thing to lexical model, where are many words are wrongly  
> aligned
> Here is an example of lexical model (lex.e2f):

The point of Moses is not to get good alignments, but to get good  
translation output.  The target language model will help the decoder  
to pick good translations, even if the translation probabilities that  
come out of the alignment do not appear to be ideal.  A great deal of  
research effort has been wasted (in my opinion) on getting better  
alignments, without actually achieving better translation.

Have you run the resulting models on a test set?  What was the score?   
How big is your language model?  More LM data is probably the easiest  
way to make up for what might appear to be poor alignments.

- John D. Burger
   MITRE

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] wrong alignment

2010-09-24 Thread musa ghurab


Thank Burger,

 


here are some informations:
Language model:   45MB
Phrase Table:  26MB

Reordering Model: 36MB

 

but I'm still waiting for tuning to finish

 

 

 
> From: j...@mitre.org
> To: moses-support@mit.edu
> Date: Fri, 24 Sep 2010 13:40:40 -0400
> Subject: Re: [Moses-support] wrong alignment
> 
> musa ghurab wrote:
> 
> > I trained a system of Chinese-Arabic language, but many alignments 
> > are wrong.
> > The same thing to lexical model, where are many words are wrongly 
> > aligned
> > Here is an example of lexical model (lex.e2f):
> 
> The point of Moses is not to get good alignments, but to get good 
> translation output. The target language model will help the decoder 
> to pick good translations, even if the translation probabilities that 
> come out of the alignment do not appear to be ideal. A great deal of 
> research effort has been wasted (in my opinion) on getting better 
> alignments, without actually achieving better translation.
> 
> Have you run the resulting models on a test set? What was the score? 
> How big is your language model? More LM data is probably the easiest 
> way to make up for what might appear to be poor alignments.
> 
> - John D. Burger
> MITRE
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
  ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] wrong alignment

2010-09-24 Thread Miles Osborne
it is probably more helpful to give the number of sentences you used
for language model training (and other details, eg ngram order).

but at first glance that looks like a tiny amount of language model
data --i would expect to see something closer to 2GB or so, depending
upon representation

Miles

2010/9/24 musa ghurab :
>
> Thank Burger,
>
>
> here are some informations:
> Language model:   45MB
> Phrase Table:  26MB
> Reordering Model: 36MB
>
> but I'm still waiting for tuning to finish
>
>
>
>> From: j...@mitre.org
>> To: moses-support@mit.edu
>> Date: Fri, 24 Sep 2010 13:40:40 -0400
>> Subject: Re: [Moses-support] wrong alignment
>>
>> musa ghurab wrote:
>>
>> > I trained a system of Chinese-Arabic language, but many alignments
>> > are wrong.
>> > The same thing to lexical model, where are many words are wrongly
>> > aligned
>> > Here is an example of lexical model (lex.e2f):
>>
>> The point of Moses is not to get good alignments, but to get good
>> translation output. The target language model will help the decoder
>> to pick good translations, even if the translation probabilities that
>> come out of the alignment do not appear to be ideal. A great deal of
>> research effort has been wasted (in my opinion) on getting better
>> alignments, without actually achieving better translation.
>>
>> Have you run the resulting model! s on a test set? What was the score?
>> How big is your language model? More LM data is probably the easiest
>> way to make up for what might appear to be poor alignments.
>>
>> - John D. Burger
>> MITRE
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] wrong alignment

2010-09-24 Thread musa ghurab


Thank Miles,

 

language model:

-order 5 -interpolate -kndiscount -unk


PhraseTable training command:

-alignment grow-diag-final 
-reordering msd-bidirectional-fe
-mgiza -mgiza-cpus 8


 

best regards


 
> From: mi...@inf.ed.ac.uk
> Date: Fri, 24 Sep 2010 19:09:50 +0100
> Subject: Re: [Moses-support] wrong alignment
> To: mossaghu...@hotmail.com
> CC: moses-support@mit.edu
> 
> it is probably more helpful to give the number of sentences you used
> for language model training (and other details, eg ngram order).
> 
> but at first glance that looks like a tiny amount of language model
> data --i would expect to see something closer to 2GB or so, depending
> upon representation
> 
> Miles
> 
> 2010/9/24 musa ghurab :
> >
> > Thank Burger,
> >
> >
> > here are some informations:
> > Language model:   45MB
> > Phrase Table:  26MB
> > Reordering Model: 36MB
> >
> > but I'm still waiting for tuning to finish
> >
> >
> >
> >> From: j...@mitre.org
> >> To: moses-support@mit.edu
> >> Date: Fri, 24 Sep 2010 13:40:40 -0400
> >> Subject: Re: [Moses-support] wrong alignment
> >>
> >> musa ghurab wrote:
> >>
> >> > I trained a system of Chinese-Arabic language, but many alignments
> >> > are wrong.
> >> > The same thing to lexical model, where are many words are wrongly
> >> > aligned
> >> > Here is an example of lexical model (lex.e2f):
> >>
> >> The point of Moses is not to get good alignments, but to get good
> >> translation output. The target language model will help the decoder
> >> to pick good translations, even if the translation probabilities that
> >> come out of the alignment do not appear to be ideal. A great deal of
> >> research effort has been wasted (in my opinion) on getting better
> >> alignments, without actually achieving better translation.
> >>
> >> Have you run the resulting model! s on a test set? What was the score?
> >> How big is your language model? More LM data is probably the easiest
> >> way to make up for what might appear to be poor alignments.
> >>
> >> - John D. Burger
> >> MITRE
> >>
> >> ___
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
  ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] filter-pt doesn't work?

2010-09-24 Thread Christof Pintaske
  Hi,

I just updated my moses installation to trunk. Unfortunately I found 
that filter-pt is now crashing instead of pruning. The patch below fixed 
the bleeding for me. However even with that patch I receive plenty of 
error messages:

 No occurrences found

and the pruned table is just too small to be true. Does filter-pt get 
out of step because the phrase table has now 5 records instead of 3 (my 
old installation is from June). filter-pt.cpp seems to be completely 
unchanged compared to the June installation.

Any hints or fixes are welcome.

best regards
Christof



diff -wc sigtest-filter/filter-pt.cpp 
../moses-2010-06-04/sigtest-filter/filter-pt.cpp
*** sigtest-filter/filter-pt.cpp2010-09-24 13:19:34.0 -0700
--- ../moses-2010-06-04/sigtest-filter/filter-pt.cpp2010-06-04 
15:33:39.0 -0700
***
*** 103,111 
   }
   }
   }
- if (i != scores.end()) {
   ++i;
- }
   char f[24];
   char *fp=f;
   while (i != scores.end() && *i != ' ') {

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] filter-pt doesn't work?

2010-09-24 Thread Christof Pintaske

 Hi,

it seems scores and extra have changed the location in the phrase table. 
The attached patch got me a lot further along, I changed the order in 
the output as well.


Not sure if

if (print_cooc_counts) os << " ||| " << pp.cfe << " " << pp.cf << " 
" << pp.ce;

if (print_neglog_significance) os << " ||| " << pp.nlog_pte;

still prints things in the correct order (pt-filter.cpp lines 144 - 145.

best regards
Christof



On 9/24/10 1:37 PM, Christof Pintaske wrote:

   Hi,

I just updated my moses installation to trunk. Unfortunately I found
that filter-pt is now crashing instead of pruning. The patch below fixed
the bleeding for me. However even with that patch I receive plenty of
error messages:

  No occurrences found

and the pruned table is just too small to be true. Does filter-pt get
out of step because the phrase table has now 5 records instead of 3 (my
old installation is from June). filter-pt.cpp seems to be completely
unchanged compared to the June installation.

Any hints or fixes are welcome.

best regards
Christof



diff -wc sigtest-filter/filter-pt.cpp
../moses-2010-06-04/sigtest-filter/filter-pt.cpp
*** sigtest-filter/filter-pt.cpp2010-09-24 13:19:34.0 -0700
--- ../moses-2010-06-04/sigtest-filter/filter-pt.cpp2010-06-04
15:33:39.0 -0700
***
*** 103,111 
}
}
}
- if (i != scores.end()) {
++i;
- }
char f[24];
char *fp=f;
while (i != scores.end()&&  *i != ' ') {

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


*** filter-pt.cpp   2010-09-24 14:45:19.0 -0700
--- /export/home/moses/src/moses-2010-06-04/sigtest-filter/filter-pt.cpp
2010-06-04 15:33:39.0 -0700
***
*** 87,105 
  {
  size_t pos = 0;
  std::string::size_type nextPos = str.find(SEPARATOR, pos);
! this->f_phrase = str.substr(pos,nextPos); 
! 
! pos = nextPos + SEPARATOR.size();
! nextPos = str.find(SEPARATOR, pos);
! this->e_phrase = str.substr(pos,nextPos-pos); 
! 
! pos = nextPos + SEPARATOR.size();
  nextPos = str.find(SEPARATOR, pos);
! this->scores = str.substr(pos,nextPos-pos); 
! 
! pos = nextPos + SEPARATOR.size();
! this->extra = str.substr(pos);
! 
  int c = 0;
  std::string::iterator i=scores.begin();
  if (index > 0) {
--- 87,98 
  {
  size_t pos = 0;
  std::string::size_type nextPos = str.find(SEPARATOR, pos);
! this->f_phrase = str.substr(pos,nextPos); pos = nextPos + 
SEPARATOR.size();
  nextPos = str.find(SEPARATOR, pos);
! this->e_phrase = str.substr(pos,nextPos-pos); pos = nextPos + 
SEPARATOR.size();
! nextPos = str.rfind(SEPARATOR);
! this->extra = str.substr(pos, ((nextPos > pos)?(nextPos-pos):0));
! this->scores = str.substr(nextPos + SEPARATOR.size(),std::string::npos);
  int c = 0;
  std::string::iterator i=scores.begin();
  if (index > 0) {
***
*** 110,118 
  }
  }
  }
- if (i != scores.end()) {
  ++i;
- }
  char f[24];
  char *fp=f;
  while (i != scores.end() && *i != ' ') {
--- 103,109 
***
*** 139,146 
  std::ostream& operator << (std::ostream& os, const PTEntry& pp)
  {
os << pp.f_phrase << " ||| " << pp.e_phrase;
-   os << " ||| " << pp.scores;
if (pp.extra.size()>0) os << " ||| " << pp.extra;
if (print_cooc_counts) os << " ||| " << pp.cfe << " " << pp.cf << " " << 
pp.ce;
if (print_neglog_significance) os << " ||| " << pp.nlog_pte;
return os;
--- 130,137 
  std::ostream& operator << (std::ostream& os, const PTEntry& pp)
  {
os << pp.f_phrase << " ||| " << pp.e_phrase;
if (pp.extra.size()>0) os << " ||| " << pp.extra;
+   os << " ||| " << pp.scores;
if (print_cooc_counts) os << " ||| " << pp.cfe << " " << pp.cf << " " << 
pp.ce;
if (print_neglog_significance) os << " ||| " << pp.nlog_pte;
return os;
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] wrong alignment

2010-09-24 Thread musa ghurab

Here is the score of Chinese-Arabic

 

using: mteval-v12.pl

  Evaluation of cn-to-ar translation using:
src set "test2010" (1 docs, 1000 segs)
ref set "test2010" (1 refs)
tst set "test2010" (1 systems)

NIST score = 6.3938  BLEU score = 0.4120 for system "chinese-arabic"

 


From: mossaghu...@hotmail.com
To: moses-support@mit.edu
Date: Sat, 25 Sep 2010 02:19:00 +0800
Subject: Re: [Moses-support] wrong alignment





Thank Miles,
 
language model:
-order 5 -interpolate -kndiscount -unk

PhraseTable training command:
-alignment grow-diag-final 
-reordering msd-bidirectional-fe
-mgiza -mgiza-cpus 8

 
best regards


 
> From: mi...@inf.ed.ac.uk
> Date: Fri, 24 Sep 2010 19:09:50 +0100
> Subject: Re: [Moses-support] wrong alignment
> To: mossaghu...@hotmail.com
> CC: moses-support@mit.edu
> 
> it is probably more helpful to give the number of sentences you used
> for language model training (and other details, eg ngram order).
> 
> but at first glance that looks like a tiny amount of language model
> data --i would expect to see something closer to 2GB or so, depending
> upon representation
> 
> Miles
> 
> 2010/9/24 musa ghurab :
> >
> > Thank Burger,
> >
> >
> > here are some informations:
> > Language model:   45MB
> > Phrase Table:  26MB
> > Reordering Model: 36MB
> >
> > but I'm still waiti! ng for tuning to finish
> >
> >
> >
> >> From: j...@mitre.org
> >> To: moses-support@mit.edu
> >> Date: Fri, 24 Sep 2010 13:40:40 -0400
> >> Subject: Re: [Moses-support] wrong alignment
> >>
> >> musa ghurab wrote:
> >>
> >> > I trained a system of Chinese-Arabic language, but many alignments
> >> > are wrong.
> >> > The same thing to lexical model, where are many words are wrongly
> >> > aligned
> >> > Here is an example of lexical model (lex.e2f):
> >>
> >> The point of Moses is not to get good alignments, but to get good
> >> translation output. The target language model will help the decoder
> >> to pick good translations, even if the translation probabilities that
> >> come out of the alignment do not appear to be ideal. A grea! t deal of
> >> research effort has been wasted (in my opin ion) on getting better
> >> alignments, without actually achieving better translation.
> >>
> >> Have you run the resulting model! s on a test set? What was the score?
> >> How big is your language model? More LM data is probably the easiest
> >> way to make up for what might appear to be poor alignments.
> >>
> >> - John D. Burger
> >> MITRE
> >>
> >> ___
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> 
> 
> 
> -- 
> The University of Edinb! urgh is a charitable body, registered in
> Scotland, with registration number SC005336.

___ Moses-support mailing list 
Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support 
   ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support