Well, Henry may clarify if it is intended to be a single token or not. But I
agree that it wouldn't make much sense to translate a
placeholder-text-placeholder sequence if represented as one single token (or
at least I can't imagine why), while for other sequences such as dates or
currencies it would make sense.

-----Ursprüngliche Nachricht-----
Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von John D Burger
Gesendet: 31 July 2012 17:25
An: Moses-support
Betreff: Re: [Moses-support] Placeholder drift

I'm a little confused.  If the intent is for the
placeholder-text-placeholder sequence to be interpreted as a single token,
why would it be translated at all?  Isn't it likely to be seen as an unknown
word, as Daniel suggests (unless of course that exact same sequence occurs
in both the parallel and language modeling data).

Sorry if I'm coming in late, and everybody already understands this.

- John Burger
  MITRE

On Jul 31, 2012, at 11:20 , Daniel Schaut wrote:

> Hi,
> 
> This placeholder salad may occur very rarely, if there are 
> placeholders in your training and tuning sets as well as in the 
> language model. Some time ago I experienced almost the same issue, 
> however occurring only with the chart decoder. You can try playing 
> around with the -dl option. Also, you can try m4loc as already 
> suggested by Tomáš if the data is in TMX or XLIFF format. Then your 
> test set may look like this
> 
> {1}processor{2}
> 
> If there are no placeholders in your sets unknown words may cause some 
> strange reordering, although they are copied verbatim (see 
> http://www.mail-archive.com/moses-support@mit.edu/msg02717.html).
> 
> What kind of reordering model are you using?
> 
> Daniel
> 
> -----Ursprüngliche Nachricht-----
> Von: moses-support-boun...@mit.edu 
> [mailto:moses-support-boun...@mit.edu] Im Auftrag von John D Burger
> Gesendet: 31 July 2012 16:09
> An: Henry Hu
> Cc: moses-support@mit.edu
> Betreff: Re: [Moses-support] Placeholder drift
> 
> Are there any such placeholders in your language modeling data and 
> your parallel training data?  If not, all the models are going to 
> treat them as unknown words.  In the case of the language model, it 
> doesn't surprise me too much that the placeholders all get pushed 
> together, as that will produce fewer discontiguous subsequences, which the
language model will prefer.
> 
> - John Burger
>  MITRE
> 
> On Jul 31, 2012, at 03:05 , Henry Hu wrote:
> 
>> Hi,
>> 
>> I use a model to translate English to French. First, I replaced HTML 
>> tags such as <a>, <b>, with the placeholder {}, like this:
>> 
>> {}Processor{}
>> 
>> Then decoding. To my confusion, I got the result:
>> 
>> {}{} processeur
>> 
>> instead of {}processeur{}. Why did the placeholder move? How can I 
>> make it fixed? Thanks for any suggestion.
>> 
>> Henry
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to