Hi Vera,

It's odd that the lexical translation model contains such an entry if
the pair is always unaligned. Maybe you used a different word alignment
when you extracted the lexicon model?

You should manually have a look at your word alignment in order to check
whether it has reasonable quality. There's a visualization tool called
"Picaro" in Moses:

$ moses/contrib/picaro/picaro.py -a1 model/aligned.1.grow-diag-final-and -f 
model/aligned.1.0.de -e model/aligned.1.0.en

In order to find out whether the symmetrization heuristic is an issue
for you, you can compare the standard and inverse GIZA alignments with
the symmetrized alignment.

Ways to experiment with word alignment quality are for instance:

- Choosing a different symmetrization heuristic
- Modifying the GIZA settings, e.g. by training with a different number
of EM iterations or a different sequence of IBM/HMM models
- Using some other method for training word alignments, e.g. a
discriminative word aligner

Also, if the amount of parallel training data is small, you shouldn't be
surprised if you are not able to train reliable models.

Cheers,
Matthias


On Thu, 2014-11-27 at 14:45 +0100, Vera Aleksic, Linguatec GmbH wrote:
> Hi,
> 
> I have one more question:
> In the lex.e2f file there is a translation Gitarre->guitar:
> 
>       Gitarre guitar 0.4000000
>       Gitarre using 0.0000284
>       Gitarre ; 0.0000017
> 
> Why has not it became part of the phrase table?
> 
> Thanks again!
> Vera
> 
> -----Ursprüngliche Nachricht-----
> Von: Vera Aleksic, Linguatec GmbH 
> Gesendet: Donnerstag, 27. November 2014 09:42
> An: 'Matthias Huck'; Raj Dabre
> Betreff: AW: [Moses-support] Unknown single words that are part of phrases
> 
> Hi,
> Thank you for your answers.
> @Raj, one-word-translations do not exist, I have searched for them. If the 
> grow-diag method probably causes such phenomena, are there any better 
> alternatives?
> @Matthias, you are right, the pair Gitarre-guitar is always unaligned, but I 
> do not really understand why. Why is "guitar" in the example below aligned to 
> "Musikinstrument Gittare", and not to "Gitarre" only? I assume, decomposing 
> "Musik + Instrument" would help? How else could I improve the word alignment 
> quality?
> Thanks!
> Best,
> Vera
> 
> für ein Musikinstrument wie eine elektrische Gitarre , NULL ({ }) for ({ 1 }) 
> a ({ 2 }) musical ({ }) instrument ({ }) , ({ }) such ({ }) as ({ 4 }) an ({ 
> 5 }) electric ({ 6 }) guitar ({ 3 7 }) ; ({ 8 })
> 
> -----Ursprüngliche Nachricht-----
> Von: Matthias Huck [mailto:mh...@inf.ed.ac.uk]
> Gesendet: Mittwoch, 26. November 2014 17:54
> An: Raj Dabre
> Cc: Vera Aleksic, Linguatec GmbH; moses-support
> Betreff: Re: [Moses-support] Unknown single words that are part of phrases
> 
> Hi,
> 
> Supposedly your phrase table does not contain an entry "Gitarre ||| guitar" 
> because this word pair is always unaligned in your training data. You could 
> try to improve your word alignment quality.
> 
> Alternatively, you could implement a procedure in the manner of the "forced 
> single word heuristic" as described in: 
> D. Stein, D. Vilar, S. Peitz, M. Freitag, M. Huck, and H. Ney. A Guide to 
> Jane, an Open Source Hierarchical Translation Toolkit. The Prague Bulletin of 
> Mathematical Linguistics, number 95, pages 5-18, Prague, Czech Republic, 
> April 2011.
> http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf
> (see Fig. 1c).
> 
> But the latter would rather be a workaround.
> 
> Cheers,
> Matthias
> 
> 
> On Thu, 2014-11-27 at 01:18 +0900, Raj Dabre wrote:
> > Hello,
> > 
> > 
> > If I am not wrong this is most likely due to the grow (-diag) method 
> > applied to the word aligned data (both directions) before phrase extraction.
> > 
> > Furthermore..... one word translations should exist (but not always).... 
> > search for them.
> > 
> > 
> > 
> > Regards.
> > 
> > 
> > On Thu, Nov 27, 2014 at 12:53 AM, Vera Aleksic, Linguatec GmbH 
> > <v.alek...@linguatec.de> wrote:
> >         Hi,
> >         
> >         I have observed many times that some words do not exist as single 
> > word translations in the phrase table, although they exist in the training 
> > corpus and in multiword phrases.
> >         An example:
> >         German-English translation for "Gitarre" is unknown, i.e. there is 
> > no single word entry  for "Gitarre" in the phrase table, although some 
> > other phrases containing this word exist (see below).
> >         How is it possible?
> >         Thanks and best regards,
> >         Vera
> >         
> >         
> >         Gitarre , ||| guitar ; ||| 1 0.0284465 1 0.0654272 2.718 ||| ||| 1 1
> >         Gitarre darstellt , unter Beanspruchung ||| guitar using ||| 0.25 
> > 2.7351e-11 1 0.0625119 2.718 ||| ||| 4 1
> >         Gitarre darstellt , unter ||| guitar using ||| 0.25 1.18917e-05 1 
> > 0.0625119 2.718 ||| ||| 4 1
> >         Gitarre darstellt , ||| guitar using ||| 0.25 0.00569228 1 
> > 0.0625119 2.718 ||| ||| 4 1
> >         Gitarre darstellt ||| guitar using ||| 0.25 0.0400028 1 0.0625119 
> > 2.718 ||| ||| 4 1
> >         Kopfplatte einer Gitarre darstellt , ||| head of a guitar using ||| 
> > 0.5 4.23407e-08 1 0.00471281 2.718 ||| ||| 2 1
> >         Kopfplatte einer Gitarre darstellt ||| head of a guitar using ||| 
> > 0.5 2.97552e-07 1 0.00471281 2.718 ||| ||| 2 1
> >         eine elektrische Gitarre , ||| an electric guitar ; ||| 1 
> > 0.00107982 1 0.00163632 2.718 ||| ||| 1 1
> >         einer Gitarre darstellt , unter ||| of a guitar using ||| 0.333333 
> > 6.4754e-07 1 0.00471281 2.718 ||| ||| 3 1
> >         einer Gitarre darstellt , ||| of a guitar using ||| 0.333333 
> > 0.000309961 1 0.00471281 2.718 ||| ||| 3 1
> >         einer Gitarre darstellt ||| of a guitar using ||| 0.333333 
> > 0.00217827 1 0.00471281 2.718 ||| ||| 3 1
> >         elektrische Gitarre , ||| electric guitar ; ||| 1 0.005661 1 
> > 0.0142097 2.718 ||| ||| 1 1
> >         wie eine elektrische Gitarre , ||| as an electric guitar ; |||
> > 1 0.000177339 1 0.000809485 2.718 ||| ||| 1 1
> >         
> >         _______________________________________________
> >         Moses-support mailing list
> >         Moses-support@mit.edu
> >         http://mailman.mit.edu/mailman/listinfo/moses-support
> > 
> > 
> > 
> > --
> > Raj Dabre.
> > Research Student,
> > 
> > Graduate School of Informatics,
> > Kyoto University.
> > CSE MTech, IITB., 2011-2014
> > 
> > 
> > _______________________________________________
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> --
> The University of Edinburgh is a charitable body, registered in Scotland, 
> with registration number SC005336.
> 
> 
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to