I'm a little confused.  If the intent is for the placeholder-text-placeholder 
sequence to be interpreted as a single token, why would it be translated at 
all?  Isn't it likely to be seen as an unknown word, as Daniel suggests (unless 
of course that exact same sequence occurs in both the parallel and language 
modeling data).

Sorry if I'm coming in late, and everybody already understands this.

- John Burger
  MITRE

On Jul 31, 2012, at 11:20 , Daniel Schaut wrote:

> Hi,
> 
> This placeholder salad may occur very rarely, if there are placeholders in
> your training and tuning sets as well as in the language model. Some time
> ago I experienced almost the same issue, however occurring only with the
> chart decoder. You can try playing around with the -dl option. Also, you can
> try m4loc as already suggested by Tomáš if the data is in TMX or XLIFF
> format. Then your test set may look like this
> 
> {1}processor{2}
> 
> If there are no placeholders in your sets unknown words may cause some
> strange reordering, although they are copied verbatim (see
> http://www.mail-archive.com/moses-support@mit.edu/msg02717.html).
> 
> What kind of reordering model are you using?
> 
> Daniel
> 
> -----Ursprüngliche Nachricht-----
> Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
> Auftrag von John D Burger
> Gesendet: 31 July 2012 16:09
> An: Henry Hu
> Cc: moses-support@mit.edu
> Betreff: Re: [Moses-support] Placeholder drift
> 
> Are there any such placeholders in your language modeling data and your
> parallel training data?  If not, all the models are going to treat them as
> unknown words.  In the case of the language model, it doesn't surprise me
> too much that the placeholders all get pushed together, as that will produce
> fewer discontiguous subsequences, which the language model will prefer.
> 
> - John Burger
>  MITRE  
> 
> On Jul 31, 2012, at 03:05 , Henry Hu wrote:
> 
>> Hi,
>> 
>> I use a model to translate English to French. First, I replaced HTML 
>> tags such as <a>, <b>, with the placeholder {}, like this:
>> 
>> {}Processor{}
>> 
>> Then decoding. To my confusion, I got the result:
>> 
>> {}{} processeur
>> 
>> instead of {}processeur{}. Why did the placeholder move? How can I 
>> make it fixed? Thanks for any suggestion.
>> 
>> Henry
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to