Hi Marcin

There's a facility to include a weight in the extract file, which is 
then used in phrase scoring. Somehow this appears to have got mixed up 
with the sentence id. The problem of not having meta data.

cheers - Barry

On 23/07/14 11:36, Marcin Junczys-Dowmunt wrote:
> Hi,
> I am using train-model.perl with
>
>    --extract-options="--IncludeSentenceId"
>
> and it seems that the sentence id is somehow getting into the phrase
> table as a count and later used for phrase translation weight
> calculation, for instance the extract (last column is the Id):
>
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374618
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374619
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374620
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374621
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 1374622
> #c the compound or process ||| #c verbindung oder verfahren ||| 0-0 2-1
> 3-2 4-3 ||| 4587318
>
> results in a phrase table entry like this:
>
> #c the compound or process ||| #c verbindung oder verfahren ||| 1
> 0.0100206 5.23542e-07 0.524577 ||| 0-0 2-1 3-2 4-3 ||| 6 1.14604e+07 6
> ||| |||
>
> The count is equal to the sum of sentence ids, which of course make the
> phrase probability useless.
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to